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(57) Abstract 

The invention provides two novel members of the cytochrome P450 2C subfamily of enzymes, designated 2C18 and 2C19. DNA 
segments encoding these enzymes are also provided. The 2C19 polypeptide represents the principal human determinant of human 
S-mephenytom 4' -hydroxylase activity. The invention also provides methods of identifying drugs metabolized by S-mephenytoin 4'- 
hydroxylase activity. Drugs shown to be metabolized by this activity should in general not be administered to individuals having, or belong 
10 an ethnic group at risk of, a polymorphic deficiency in S-mcphenytoin 4'-hydroxylase activity. The invention also provides methods of 
diagnosing individuals having a polymorphic deficiency. 
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CLONING, EXPRESSION AND DIAGNOSIS OF 
HUMAN CYTOCHROME P450 2C19: 
THE PRINCIPAL DETERMINANT OF S-MEPHENYTOIN METABOLISM 
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TECHNICAL FIELD 
The present invention relates generally to isolation 
and exploitation of a novel meniber of the cytochrome P450 2C 
20 subfamily of enzymes 2C19, which is shown to be the principal 
hximan determinant of human S-mephenytoin metabolism. The 
invention also relates to the isolation and exploitation of an 
additional member of this family designated 2C18. 

25 BACKGROUND OF THE INVENTION 

The cytochromes P450 are a large family of 
hemoprotein enzymei capable of metabolizing xenobiotics such 
as drugs, carcinogens and environmental pollutants as well as 
endobiotics such as steroids, fatty acids and prostaglandins. 

30 Some members of the cytochrome P450 family are inducible in 
both animals and cultured cells, while other forms are 
constitutive. This group of enzymes has both harmful and 
beneficial activities. Metabolic conversion of xenobiotics to 
toxic, mutagenic and carcinogenic forms is a harmful activity. 

35 Detoxification of some drugs and other xenobiotic substances 
is a beneficial activity (Gelboin, Physiol. Rev. 60:1107-1). 
A further beneficial activity is the metabolic processing of 
some drugs to activated forms that have pharmacological 
activity. 

40 Genetic polymorphisms of P4 50 enzymes result in 

phenotypically-distinct subpopulations that differ in their 
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ability to perforin particular drug biotransformation 
reactions. These phenotypic distinctions have important 
implications for selection of drugs. For example, a drug that 
is safe when administered to most human may cause intolerable 
5 side-effects in an individual suffering from a defect in a 
P450 enzyme required for detoxification of the drug. 
Alternatively, a drug that is effective in most humans may be 
ineffective in a particular subpopulation because of lack of a 
P450 enzyme required for conversion of the drug to a 

10 metabolically active form. Accordingly, it is important for 
both drug development and clinical use to screen drugs to 
determine which P450 enzymes are required for activation 
and/ or detoxification of the drug. It is also important to 
identify individuals who are deficient in a particular P450 

15 enzyme. 

A cytochrome P450 polymorphism of particular concern 
results in reduced levels of S-mephenytoin 4 ' -hydroxylase 
activity in certain subpopulations . (Kvipfer et al., Eur. J. 
Clin, Pharmacol. 26:753-759 (1984); Wedlund et al. , Clin. 

20 Pharmacol. Ther. 36:773-780 (1984). Two phenotypes, extensive 
and poor metabolizers , are present in the human population. 
Poor metabolizers are detected at low frequencies in 
Caucasians (2-5%) but at higher frequencies in the Oriental 
population (-20%) (Nakamura et al., CIii3. Pharmacol. Ther. 

25 38:402-408 (1985); Jurima et al. , Br. J. Clin. Pharmacol. 

19:483-487 (1985) and blacks ('12%). 4 • -hydroxylation of S- 
mephenytoin is 3-10 fold higher than that of the R- enantiomer 
in extensive metabolizers, but the ratio is approximately 1 or 
less in poor metabolizers (Yasumori et al., Mol. Pharmacol. 

30 35:443-449 (1990). Rates of S-mephenytoin 4 • -hydroxylation in 
liver microsomes are also much higher than those of R- 
mephenytoin in extensive metabolizers. 

There is some evidence that S-mephenytoin 4 ' 
hydroxylase activity resides in the cytochrome P450 2C family 

35 of enzymes. A number of 2C human variants (designated 2C8, 
2C9 and 2C10) have been partially purified, and/or cloned. 
See Shimada et al., J. Biol. Chem. 261:909-921 (1986); Kawano 
et al., J. Biochem. (Tokyo) 102:493-501 (1987); Gut et al. , 
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Biochem. Biophys, Acta 884:435-447 (1986); Beaune et al. , 
Biochem Biophys, Acta 840:364-370 (1985); Ged et al., 
Biochemistxy 27:6929-6940 (1988)); Umbenhauer et al.. 
Biochemistry 26, 1094-1099 (1987); Kimura et al.. Nucleic ^ 
5 Acids Res, 15:10053-10054 (1987); Shephard et al., Ann, Humn. 
Gentc. 53:23-31 (1989); Yasumori et al,, J. Biochem. 102:1075- 
1082 (1987); Relling et al. , J. Pharmacol. Ther. 252:442-447. 
A comparison of the P450 2C cDNAs and their predicted amino 
acid sequences shows that about 70% of the amino acids are 

10 absolutely conserved among the human P45G 2C subfamily. Some ^ 
regions of human P450 2C protein sequences have particularly 
highly conservation, and these regions may participate in 
common P450 functions. Other regions show greater sequence 
divergence regions and are likely responsible for different 

15 substrate specificities between 2C members. 

There has been considerable controversy as to 
whether any of the known 2C members encodes the principal 
human determinant of S-mephenytoin 4' hydroxylase activity, in 
which the polymorphism discussed above presumably resides. 

20 The multiplicity and common properties of cytochromes P450 
make it difficult to separate their different forms, 
especially the minor forms. Even in situations where P450 
cytochromes have been isolated in purified form by 
conventional enzyme purification procedures, they have been 

25 removed from the natural biological membrane association and 
therefore require the addition of NADPH-cytochrome P450 
reductase and other cell fractions for enzymatic activity. 

The known members of the cytochrome P450 2C family 
exhibit only low-levels of S-mephenytoin 4 ' -hydroxylase 

30 activity, if any. Moreover, such low levels of activity are 

not specific for the S-enantiomer . For example, when the cDNA 
isolated by Kimura et al. (1987), supra, was expressed in 
HepG2 cells, it metabolized racemic and (R) -mephenytoin but 
had no (S) -mephenytoin hydroxylase activity, suggesting that 

35 the polymorphism in the metabolism of (S) -mephenytoin resides 
in a different member of the P450 family. As a further 
example, Yasumori et al. (1991), supra, reported that an 
allelic variant of 2C9 {Arg^'^'^Tyr^^Hso^^^GlY^^'^) showed a low- 
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level of catalytic activity toward S-mephenytoin in a cDNA- 
directed yeast expression. However, Srivastava et al., Mol. 
Pharmacol, 40:69-69 (1991) expressed an identical cDNA in 
yeast and a Arg^^^Cys^^^Iso^^^Asp^^'^ variant (2C10 by present 
5 nomenclature) but were unable to demonstrate catalytic 

activity of 2C9 or 2C10 toward S-mephenytoin. Relling et al., 
J. Pharmacol. Exper, Ther. 252:442-447 (1990), were also 
unable to demonstrate catalytic activity of an allelic variant 
of Cys^^^Tyr^S^Ile^^^Gly^^'^-2C9 toward S-mephenytoin using a 

10 retroviral cDNA expression system in HepG2 cells. In 

contrast, all of these 2C9 variants metabolized tolbutamide in 
the various expression systems confirming that failure to 
observe S-mephenytoin 4 ' -hydroxylase activity was not due to 
deficiencies in the expression system. 

^5 Based on the foregoing, it is apparent that a need 

exists to identify and isolate the P450 2C family member 
representing the principal determinant of S-mephenytoin 4'- 
hydroxylase activity in humans. There is also a need for 
stable cell lines expressing the S-mephenytoin 4 '-hydroxylase 

20 activity. A need is also apparent for methods of screening 
drugs for safety and efficacy in individuals deficient in S- 
mephenytoin 4 '-hydroxylase activity. There is also a need for 
methods for diagnosing individuals deficient in S-mephenytoin 
4 ' -hydroxylase activity. The present invention fulfills these 

25 and other needs. 

SUMMARY OF THE INVENTION 
The invention provides purified cytochrome P450 2C19 
polypeptides. The amino acid sequence of an exemplary P450 

3 0 2C19 polypeptide is designated SEQ. ID. No» 1. Other 

cytochrome P450 2C19 polypeptides usually comprises an amino 
acid sequence having at least 97% sequence identity with the 
exemplified sequence. Many of the 2C19 polypeptides of the 
invention exhibit stereospecif ic S-mephenytoin 4 '-hydroxylase 

35 activity. The activity is typically at least about 1 nmol 
mephenytoin per nmol of the purified polypeptide per minute. 
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The invention also provides purified cytochrome P450 
2C18 polypeptides. The amino acid sequences of exemplary 2C18 
polypeptides are designated SEQ. ID. Nos. 5 and 11. 

In another aspect of the invention, purified DNA 
segments encoding the P450 2C19 polypeptides described above 
are provided. Some DNA segments encode the exemplary P450 
2C19 having the amino acid sequenced designated SEQ. ID. 
No. 1. One such exemplary DNA segment is designated SEQ. ID. 
No. 2. Other DNA segments encode the P450 2C18 polypeptides 
described above. Exemplary DNA segments are designated SEQ; 
ID. Nos. 6 and 12. 

In a further aspect of the invention stable cell 
lines are provided. The cell lines comprise an exogenous DNA 
segment encoding a cytochrome P450 2C19 polypeptide having at 
least 97% sequence identity with the amino acid sequence 
designated SEQ. ID. No. 1, The DNA segment is capable of 
being expressed in the cell line, cell lines preferably 
produce high levels of the P450 2C19 polypeptide such as 10- 
200 pmol of the polypeptide per mg of total microsomal 
protein. Preferred cell lines are eukaryotic, including yeast 
and insect cells. 

The invention also provides methods of producing a 
cytochrome P450 2C19 polypeptide. In these methods, a st^jble 
cell line, as described above, is cultured under conditions 
such that the DNA segment contained in the cell line is 
expressed. 

The invention also provides antibodies that 
specifically bind to a 2C19 polypeptide comprising the amino 
acid sequence designated SEQ, ID. No. 1. Preferred antibodies 
are incapable of binding to nonallelic forms of 2C 
polypeptides, such as 2C9. 

In another aspect, the invention provides methods of 
screening for a drug that is metabolized by S-mephenytoin 4'- 
hydroxylase activity. The drug is contacted with a cytochrome 
P450 2C19 polypeptide. A metabolic product resulting from an 
interaction between the polypeptide is detected. The presence 
of the product indicates that the drug is metabolized by the 
S-mephenytoin 4 • -hydroxylase activity. The cytochrome P450 



wo 95/30766 



PCT/US95/05744 



6 

2C19 used in the methods may be substantially pure or may be a 
component of a lysate of a stable cell line. The cytochrome 
P450 2C19 polypeptide may also be a component of an intact 
stable cell line. Some methods further comprise the steps of 
contacting the drug with a liver extract comprising a mixture 
of cytochrome P450 polypeptides, and detecting a metabolic 
product resulting from an interaction between the drug and the 
mixture of cytochrome P450 polypeptides - 

The invention also provides methods of identifying a 
mutagenic, carcinogenic or cytotoxic compound. In some 
methods, the compound is contacted with a stable cell line 
capable of expressing a 2C19 polypeptide, such as described 
above. Mutagenic, carcinogenic or cytotoxic effects of the 
compound on the cell line are assayed- In other methods, the 
compound is contacted with a cytochrome P450 2C19 polypeptide 
in a reaction mixture, A metabolic product is generated 
resulting from S-mephenytoin 4 ' -hydroxylase activity on the 
compound. The metabolic product is assayed for mutagenic, 
carcinogenic or cytotoxic effects on a test cell line. The 
effects indicate that the compound is mutagenic, carcinogenic 
or cytotoxic. In some methods, the test cell line is added to 
the reaction mixture before, during or after the contacting 
step. I The 2C19 polypeptide used in these methods can be 
substantially pure or a component of a lysate of a stable cell 
line. The 2C19 polypeptide can also be a component of an 
intact stable cell line. Salmonella typhixnurium is a 
preferred cell line. 

The invention also provides methods for testing the 
chemopreventive activity of an agent. A stable cell line 
capable of expressing a 2C19 polypeptide, such as described 
above, is contacted with an agent suspected of being 
chemopreventive in the presence of a carcinogen. The agent 
can be contacted with the cell line before addition of the 
carcinogen. Effects of the agent on the cell line that are 
indicative of chemopreventive activity are monitored. 

The invention also provides methods for determining 
the metabolites activated by a carcinogenic or xenobiotic. A 
stable cell line capable of expressing a 2C19 polypeptide, 
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such as described above, is contacted with the suspected 
carcinogen or xenobiotic. Metabolites and/or their effects 
are identified. 

The invention also provides methods of detecting a 
5 cytochrome 2C19 polypeptide in a tissue sample. The tissue 
sample is contacted with an antibody that specifically binds 
to the 2C19 polypeptide preferably without specifically 
binding to nonallelic variants such as 2C9. Specific binding 
between the antibody and the polypeptide is detected to 

10 indicate the presence of the polypeptide. 

In another aspect of the invention, methods of 
diagnosing a patient having a deficiency in S-mephenytoin 4'- 
hydroxylase activity are provided. In these methods, a sample 
of nucleic acids is obtained from the patient, and 

15 a cytochrome P450 2C19 DNA sequence from the nucleic acids in 
the sample is analyzed for the presence of a polymorphism 
indicative of the deficiency. The most frequently occurring 
polymorphisms in the P450 2C19 genes occur at nucleotides 681 
and 636 of the 2C19 gene. 

20 In some methods, the P450 2C19 DNA sequence subject 

to analysis is genomic. In such methods, an amplifying step 
is often primed from a forward primer sufficiently 
complementary with a first subsequence of the antisense strand 
of the 2C19 sequence to hybridize therewith, and a reverse 

25 primer sufficiently complementary to a second subsequence of 
the sense strand of the 2C19 sequence to hybridize therewith. 

Some methods detect a polymorphism at nucleotide 681 
of the coding region of the P450 2C19 DNA genomic sequence. 
This can be achieved by selecting a forward primer that 

30 hybridizes upstream from nucleotide 681 of the coding region, 
and a reverse primer that hybridizes downstream from 
nucleotide 681 of the coding region. Amplification products 
generated from these primers can be analyzed by digesting the 
amplified DNA segment with a restriction enzymes that 

35 recognizes a site that includes nucleotide 681 of the coding 
region. 

Other methods detect a polymorphism at nucleotide 
636 of the coding region of the P450 2C19 DNA genomic 
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sequence. This can be achieved using a forward primer that 
hybridizes upstream from nucleotide 63 6 of the coding region, 
and a reverse primer that hybridizes downstream of nucleotide 
636 of the coding region. Amplification products are 
5 conveniently analyzed by digestion with an enzyme that 

recognizes a site that includes nucleotide 636 of the coding 
region. 

Other methods detect the 681 polymorphism by a 
different approach involving selective amplification of the 

10 wildtype or mutant allele. For example, for selective 

amplification of the wildtype allele, a suitable forward 
primer has about 10-50 contiguous nucleotides from the 
wildtype 2C19 sequence shown in Fig. 16 including the 
nucleotide at position 681 of the coding region. The forward 

15 primer primes amplification from the complement of the 

wildtype 2C19 sequence without priming amplification from the 
complement of the mutant 2C19 sequence shown in Fig- 16, 
Preferably, the 3* nucleotide of the forward primer is the 
nucleotide at position 681. Analogously, the 681 mutant 

20 allele can be amplified using a forward primer having 

about 10-50 contiguous nucleotides from the mutant 2C19 
sequence shown in Fig. 16 including the nucleotide at position 
681 of the coding sequencJ. The forward primer primes 
amplification from the complement of the mutant 2C19 sequence 

25 without priming eunplif ication from the complement of the 
wildtype 2C19 sequence shown in Fig 16, 

The invention also provides analogous methods for 
detection of the 63 6 polymorphism. 

In other methods, the segment of 2C19 DNA subject to 

30 analysis is a cDNA sequence. cDNA is produced by reverse 

transcribing mRNA in the sample to produce the cDNA sequence. 
In some methods for detecting the 681 polymorphism, the 
forward primer comprises about 10-50 contiguous nucleotides 
upstream of nucleotide 643 of the coding region of the 

35 wildtype 2C19 cDNA sequence shown in Fig. 12 and hybridizes to 
the complement of the 2C19 sequence upstream from nucleotide 
643 of the coding region, and the reverse primer comprises 
about 10-50 contiguous nucleotides from the complement of the 
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wildtype 2C19 cDNA sequence shown in Fig. 12 and hybridizes to 
the 2C19 sequence downstream from nucleotide 682 of the coding 
region. In other methods, the forward primer hybridizes to 
the complement of the wildtype 2C19 cDNA sequence shown in ^ 
Fig. 12 between nucleotides 643 and 682 without hybridizing to 
the complement of the mutant 2C19 cDNA sequence shown in 
Fig. 12. In other methods, the reverse primer hybridizes to 
the wildtype 2C19 cDNA sequence shown in Fig. 12 between 
nucleotides 64 3 and 682 without hybridizing to the mutant 2C19 
cDNA sequence shown in Fig. 12. 

The invention provides analogous methods for 
diagnosing the 63 6 polymorphism from cDNA. In some methods, 
the forward primer comprises about 10-50 contiguous 
nucleotides upstream of nucleotide 63 6 of the coding region of 
the wildtype 2C19 cDNA sequence shown in Fig. 12, and the 
reverse primer comprises about 10-50 contiguous nucleotides 
from the complement of the wildtype 2C19 cDNA sequence shown 
in Fig. 12 downstream from nucleotide 636 of the coding 
region - 

The invention also provides methods capable of 
detecting any polymorphism from cDNA. In these methods, the 
full-length 2C19 cDNA sequence is usually amplified. Analysis 
is often performed by sequencing a segment of the 2C19 cDNA 
amplification product. 

The invention provides further methods for 
diagnosing polymorphisms in genomic DNA. In these methods, 
genomic DNA is digested with a restriction enzyme that 
recognizes a site that includes nucleotide 63 6 or 681 of the 
coding region. The digestion products are then detected by 
Southern blotting with a labelled segment of the 2C19 DNA 
sequence as a probe. 

In another aspect of the invention, diagnostic kits 
are provided. Some diagnostic kits comprise forward and 
reverse primers. The forward primer is sufficiently 
complementary with a first subsequence of the antisense strand 
of a double-stranded 2C19 genomic DNA sequence to hybridize 
therewith, and the reverse primer sufficiently complementary 
with a second subsequence of the sense strand of the 2C19 
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genomic sequence to hybridize therewith. For example, in some 
methods for diagnosis of the 681 polymorphism, the first 
subsequence is upstream of nucleotide 681 of the coding 
region, and second subsequence is downstream of nucleotide 681 
of the coding region. Similarly, in some methods for 
diagnosis of the 63 6 polymorphism, the first subsequence is 
upstream of nucleotide 63 6 of the coding region, and the 
second subsequence is downstream of nucleotide 63 6 of the 
coding region. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 shows Western blots of human liver 
microsomal proteins. Microsomal proteins were separated by 
SDS-polyacrylamide gel electrophoresis. Blot A was performed 
using polyclonal antibody to 2C9 and blot B with anti-2C8 
(HLx) . Each lane represents 20 fig of microsomal protein from 
an individual liver. The 2C8 antibody also recognized 
purified rat P450 2C13(g). cDNA libraries were constructed 
from livers 860624 (low HLx) and S33 (high HLx). 

Figure 2 contains nucleotide sequences of human P450 
2C cDNAs. 2c (SEQ, ID, No. 14) is indicated in the top line 
and represents the consensus secpience where j.nformation from 
more than one sequence is available. Sequences were 
determined by the dideoxy chain termination method. The 
differences observed for clones 25 (SEQ. ID. No. 4) and 65 
(SEQ. ID. No. 10) are underlined. The termination codons are 
starred. The heme binding region and polyadenylation signals 
are underlined. The one-base difference between 29c (SEQ. ID. 
No. 6) and 6b (SEQ. ID. No. 12) are also underlined. The 
termination codon is starred. The new allelic variant 
proteins of 2C18, referred to as 29c (SEQ. ID. No. 5) and 6b 
(SEQ. ID. No. 11), and the new protein of 2C19, referred to as 
lla (SEQ. ID. No. 1), are compared with the protein of 2C8, 
referred to as 2C8 (SEQ. ID. No. 7), and the allelic variant 
proteins of 2C9, referred to as 65 (SEQ. ID. No. 9) and 25 
(SEQ. ID. No. 3) . 

Figure 3 depicts a comparison of amino acid 
sequences of cytochrome P45G 2C8 allelic variants. 
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Figure 4 depicts a Western blot of recombinant 
transformed COS-1 cells. Each lane represents microsomal 
protein (50 fig) from an independent transformation with the 
indicated P450 2C cDNA, mock-transf ected cells (CON) , 20 of 
human liver microsomal protein (liver S5) , or 2 pmol of pure 
P450g (2C13). 

Figure 5 shows a Northern blot of human mRNAs. Each 
lane represents 10 /ig of mRNA, and the blot was probed with 
end-labeled T300R, an oligoprobe specified for 2C8 (SEQ. ID. 
No. 8) (top), stripped, and reprobed with ^^P-actin cDNA 
(bottom) . 

Figure 6: Western blots of yeast microsomes 
expressing recombinant P450 2C cDNAs, CON=:control (yeast 
microsomes lacking recombinant proteins) • 

Figure 7: Linearity of S-mephenytoin 4 • -hydroxylase 
activity and amount of recombinant cytochrome P450 2C19. 

Figure 8: S-mephenytoin 4 '-hydroxylase activity as 
a function of the molar ratio of cytochrome to recombinant 
cytochrome P450. 

Figure 9: HPLC radiochromatograms of metabolites 
formed after incubation of labelled mephenytoin with P450 2C 
enzymes, human liver microsomes and yeast control. 

I Figure 10: Comparison of liver content of 

cytochrome P450 2C enzymes with S-mephenytoin 4 '-hydroxylase 
activity. The upper part of the figure shows Western blots of 
liver samples from 16 individuals. The lower part of the 
figure shows the S-mephenytoin 4 ' -hydroxylation activity and 
ratios of S/R mephenytoin 4 • -hydroxylase activity in each 
sample. 

Figure 11: Correlation between hepatic 2C19 content 
and S-mephenytoin hydroxylase activity based on the data shown 
in Figure 10, 

Figure 12: Sequence alignment of PCR products from 
normal and aberrantly spliced CYP2C19 cDNAS (SEQ, ID. Nos. 45 
and 47), with the corresponding amino acid translations (SEQ. 
ID. Nos. 46 and 48) indicated above and below the nucleotide 
sequence. The new termination codon TAA in the aberrant cDNA 
is indicated by the word END and the asterisk. The PCR 
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primers are indicated by the horizontal arrows in the 
sequence. The aberrant CYP2C19 cDNA is missing 4 0 base pairs 
of the cDNA in poor metabolizers as indicated by the dotted 
line. ^ 
5 Figure 13: A. Diagram of strategy to amplify 

CYP2C19 cDNA transcripts from human liver samples. The 
sequence for the PGR primers is indicated in Fig. 12. This 
strategy yielded a 284 bp band for the normal cDNA, a 244 bp 
band for the aberrant cDNA and both bands with cDNA from 

10 heterozygous individuals. The hatched area indicates the 40- 
bp deleted in exon 5 of the aberrant cDNA. B. Relation 
between genotype as assessed by reverse transcription PGR (RT- 
PCR) of human liver mRNA, CYP2C19 protein estimated by 
immunoblotting, 5-mephenytoin hydroxylation activity, and the 

15 ratio of metabolism of the R/S enantiomers. In vitro 

phenotype was based on high (E) , intermediate (I) or low (P) 
S-mephenytoin 4 ' -hydroxylase activity. 

Figure 14: A. Diagram showing strategy used to 
genotype genomic DNA from human blood. B. Diagram of family 

2 0 of propositus 61 (arrow) showing the pedigree and the gel of 
Smal-digested PGR products. G. Analysis of genomic DNA from 
selected Caucasians subjects from United States or from 
Switzerland. The phenotype (EM, IM or PM) is indicated in the 
brackets above the gel. D. Analysis of genomic DNA from 

25 selected Oriental subjects. 

Figure 15: A. Partial sequence of the intron 
4 /exon 5 junction of CYP2C19 in extensive and poor 
metabolizers (SEQ. ID. Nos. 49 and 50). Intron sequences are 
shown in lower case and exon sequences in capitals. The 

30 nucleotides deleted in the aberrantly spliced cDNA are 

indicated in bold. The polymorphic Smal site is underlined in 
2C19 (wt) . The highly conserved AG residues at the intron/exon 
junction are shown in black boxes. The consensus sequence 
(IIYNGAGG) (Y=pyrimidine, R=purine, N=any base) for the 3 

35 splice site is indicated underneath the normal and cryptic 

splice junctions. The branch point consensus sequence (CURAY) 
is placed underneath two putative branch points. B. 
Sequencing of PGR products of genomic DNA from three 
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individuals who were homozygous normal, heterozygous, and 
homozygous defective (based on their Smal restriction 
digests) . The polymorphic Smal restriction site is indicated 
by the bracket in the homozygous wt sequence. The G-»A base 
pair change corresponding to position 681 of the cDNA is also 
indicated. C. Schematic representation of splicing in 
CYP2C19^ and in CYP2C1S^. The black box indicates the 40 bp 
that are deleted in exon 5 of poor metabolizers • 

Figure 16: Additional 2C19 genomic sequence 
flanking the 681 polymorphism. The wildtype (SEQ. ID. No. 51) 
and mutant (SEQ. ID. No. 61) sequences are identical except 
for the G/A transposition at nucleotide 681. Regions of 
sequence ambiguity are indicated in lower case (n=any 
nucleotide, k=G/T ambiguity, r-A/G ambiguity, m=A/C 
ambiguity) . 

Figure 17: Genomic DNA sequence flanking the 63 6 
polymorphism (also referred to as m2). Wildtype and mutant 
sequences are designated SEQ. ID. Nos. 52 and 54 respectively. 
Intron sequences are indicated in lower case and exons in 
capital. Translated amino acids (SEQ. ID. No. 53) are 
indicated above the nucleotide sequence* The numbers 
underneath the sequences indicate the first (482) and last 
(642) nucleotides in e|con 4. The two mutations found in exon 
4 are indicated in bold. The aberrant stop codon is indicated 
by the word "End." Exemplary primers for PGR amplification 
are underlined. 

Figure 18: Diagnosis of 636 mutation in 2C19. The 
position of the PGR primers is indicated by arrows at 79-55 
base pairs in intron 3 and 70-89 bp in intron 4. The size of 
the PGR products expected in the wild type gene (wt) and the 
size of the product in the 636 mutant allele are shown in the 
bottom lines. 

Figure 19: Simultaneous detection of the 636 and 
681 mutations. 

DEFINITIONS 

Abbreviations for the twenty naturally occurring 
amino acids follow conventional usage (Ijnmunology - A 
Synthesis (E.S, Golub & D.R. Gren, eds., Sinauer Associates, 
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Sunderland, MA, 2nd ed. , 1991) (hereby incorporated by 
reference for all purposes). Stereoisomers (e.gr., D-amino 
acids) of the twenty conventional amino acids, unnatural amino 
acids such as a, a-disubstituted cimino acids, N-alkyl amino 
5 acids, lactic acid, and other unconventional amino acids may 
also be suitable components for polypeptides of the present 
invention. Examples of unconventional amino acids include: 4- 
hydroxyproline , y-carboxyglutamate , e-N , N , N-tr imethy llysine , 
£-N-acety Ilys ine , 0-phosphoserine , N-acetylserine , N- 

10 formylmethionine, 3-methylhistidine, 5-hydroxy lysine, (j-N- 

methylarginine, and other similar amino acids and imino acids 
{e.g., 4-hydroxyproline) . In the polypeptide notation used 
herein, the left-hand direction is the amino terminal 
direction and the right-hand direction is the carboxy-terminal 

15 direction, in accordance with standard usage and convention. 
Similarly, unless specified otherwise, the lefthand end of 
single-stranded polynucleotide sequences is the 5* end; the 
lefthand direction of double-stranded polynucleotide sequences 
is referred to as the 5' direction. The direction of 5' to 3' 

20 addition of nascent RNA transcripts is referred to as the 

transcription direction; sequence regions on the DNA strand 
that are 5* to the 5' end of the RNA transcript are referred 
to as "upstream sequences"; sequence regions on the DNA strand 
that are 3' to the 3' end of the RNA transcript are referred 

25 to as "downstream sequences". 

The phrase "polynucleotide sequence" refers to a 
single or double-stranded polymer of deoxyribonucleotide or 
ribonucleotide bases read from the 5* to the 3* end- It 
includes self -replicating plasmids, infectious polymers of DNA 

30 or RNA and non-functional DNA or RNA. 

The following terms are used to describe the 
sequence relationships between two or more polynucleotides: 
"reference sequence", "comparison window", "sequence 
identity", "percentage of sequence identity", and "substantial $ 

35 identity". A "reference sequence" is a defined sequence used 
as a basis for a sequence comparison; a reference sequence may 
be a subset of a larger sequence, for example, as a segment of 
a full-length cDNA or gene sequence given in a sequence 
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listing, such as a polynucleotide sequence shown in SEQ. ID, 
NO. 2 or may comprise a complete cDNA or gene sequence. 
Generally, a reference sequence is at least 20 nucleotides in 
length, frequently at least 25 nucleotides in length, and 
5 often at least 50 nucleotides in length. Since two 

polynucleotides may each (1) comprise a sequence (i.e., a 
portion of the complete polynucleotide sequence) that is 
similar between the two polynucleotides, and (2) may further 
comprise a sequence that is divergent between the two 

10 polynucleotides, sequence comparisons between two (or more) 

polynucleotides are typically performed by comparing sequences 
of the two polynucleotides over a "comparison window" to 
identify and compare local regions of sequence similarity. A 
"comparison window", as used herein, refers to a conceptual 

15 segment of at least 20 contiguous nucleotide positions wherein 
a polynucleotide sequence may be compared to a reference 
sequence of at least 20 contiguous nucleotides and wherein the 
portion of the polynucleotide sequence in the comparison 
window may comprise additions or deletions (i.e., gaps) of 20 

20 percent or less as compared to the reference sequence (which 
does not comprise additions or deletions) for optimal 
alignment of the two sequences. Optimal alignment of 
sequences for aligning a comparison window i may be conducted by 
the local homology algorithm of Smith & Waterman, Appl, Math. 

25 2:482 (1981), by the homology alignment algorithm of 

Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), by the search 
for similarity method of Pearson & Lipman, Proc. Natl, Acad. 
Sci. (USA) 85:2444 (1988), by computerized implementations of 
these algorithms (FASTDB (Intelligenetics) , BLAST (National 

30 Center for Biomedical Information) or GAP, BESTFIT, PASTA, and 
TFASTA (Wisconsin Genetics Software Package Release 7.0, 
Genetics Computer Group, 575 Science Dr., Madison, WI)), or by 
inspection, and the best alignment (i.e., resulting in the 
highest percentage of sequence similarity over the comparison 

35 window) generated by the various methods is selected. The 
term "sequence identity" means that two polynucleotide 
sequences are identical (i.e., on a nucleotide-by-nucleotide 
basis) over the window of comparison. The term "percentage of 
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sequence identity" (also sometimes referred to as "percentage 
homology") is calculated by comparing two optimally aligned 
sequences over the window of comparison, determining the 
number of positions at which the identical nucleic acid base 
5 (e.g.,A, T, C, G, U, orl) occurs in both sequences to yield 
the number of matched positions, dividing the number of 
matched positions by the total number of positions in the 
window of comparison (i.e., the window size), and multiplying 
the result by 100 to yield the percentage of sequence 

10 identity. The terms "substantial identity" as used herein 

denotes a characteristic of a polynucleotide sequence, wherein 
the polynucleotide comprises a sequence that has at least 85 
percent sequence identity, preferably at least 96 percent 
sequence identity, more usually at least 97, 98 or 99 percent 

15 sequence identity as compared to a reference sequence over a 
comparison window of at least 20 nucleotide positions, 
frequently over a window of at least 25-50 nucleotides, 
wherein the percentage of sequence identity is calculated by 
comparing the reference sequence to the polynucleotide 

20 sequence which may include deletions or additions which total 
20 percent or less of the reference sequence over the window 
of comparison. The reference sequence may be a subset of a 
larger sequence, for example, as a segment of the full-length 
sequence of SEQ. ID. Nos. 2, 6 or 12. 

25 As applied to polypeptides, the term "substantial 

identity" (or "substantial homology") means that two peptide 
sequences, when optimally aligned, such as by the programs 
BLAZE (Intelligenetics) GAP or BESTFIT using default gap 
weights, share at least 85% sequence identity preferably at 

3 0 least 96 percent sequence identity, more preferably at least 
97, 98 or 99 percent sequence identity or more (e.g., 99,5 
percent sequence identity) . Preferably, residue positions 
which are not identical differ by conservative amino acid 
substitutions. Conservative amino acid substitutions refer to 

3 5 the interchangeability of residues having similar side chains. 
For example, a group of amino acids having aliphatic side 
chains is glycine, alanine, valine, leucine, and isoleucine; a 
group of amino acids having aliphatic-hydroxyl side chains is 
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serine and threonine; a group of amino acids having amide- 
containing side chains is asparagine and glutamine; a group of 
amino acids having aromatic side chains is phenylalanine, 
tyrosine, and tryptophan; a group of amino acids having basic 
5 side chains is lysine, arginine, and histidine; and a group of 
amino acids having sulfur-containing side chains is cysteine 
and methionine. Preferred conservative amino acids 
substitution groups are: valine-leucine-isoleucine, 
phenylalanine-tyrosine , lysine-arginine , alanine-valine , and 

10 asparagine-glutamine. 

The term "substantially pure" means an object 
species is the predominant species present (i.e., on a molar 
basis it is more abundant than any other individual species in 
the composition) , and preferably a substantially purified 

15 fraction is a composition wherein the object species comprises 
at least about 50 percent (on a molar basis) of all 
macromolecular species present. Generally, a substantially 
pure composition will comprise more than about 80 to 90 
percent of all macromolecular species present in the 

20 composition. Most preferably, the object species is purified 
to essential homogeneity (contaminant species cannot be 
detected in the composition by conventional detection methods) 
wherein the composition consists essentially of a single 
macromolecular species. 

25 The term "naturally-occurring" as used herein as 

applied to an object refers to the fact that an object can be 
found in nature. For example, a polypeptide or polynucleotide 
sequence that is present in an organism (including viruses) 
that can be isolated from a source in nature and which has not 

30 been intentionally modified by man in the laboratory is 
naturally-occurring. 

The term "epitope" includes any protein determinant 
capable of specific binding to an immunoglobulin or T-cell 
receptor. Epitopic determinants usually consist of chemically 

35 active surface groupings of molecules such as amino acids or 
sugar side chains and usually have specific three dimensional 
structural characteristics, as well as specific charge 
characteristics . 
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Specific binding exists when the dissociation 
constant for a dimeric complex is s 1 /iM, preferably s: 100 nM 
and most preferably s 1 nM. 

The term "allelic variants" refers to gene sequences" 
mapping to the same chromosomal location in different 
individual in a species but showing a small degree of sequence 
divergence from each other. Typically, allelic variants 
encode polypeptides exhibiting at least 96% or 97% amino acid 
sequence identity with each other. 

The term "nonallelic variants" refers to gene 
sequences that show similar structural and/ or functional 
properties but map at different chromosomal locations in an 
individual. In the 2C family, nonallelic variants typically 
exhibit 70-96% amino acid sequence identity with each other. 

The term "cognate variants" refers to gene sequences 
that are evolutionarily and functionally related between 
humans and other species such as primates, porcines, bovines 
and rodents such as mice and rats. Thus, the cognate primate 
gene to a human 2C19 gene is the primate gene which encodes an 
expressed protein which has the greatest degree of sequence 
identity to the 2C19 protein and which exhibits an expression 
pattern similar to that of the 2C19 protein - 

Stringent conditions are sequence dependent and will 
be different in different circumstances. Generally, stringent 
conditions are selected to be about S*' C lower than the 
thermal melting point (Tm) for the specific sequence at a 
defined ionic strength and pH. The Tm is the temperature 
(under defined ionic strength and pH) at which 50% of the 
target sequence hybridizes to a perfectly matched probe. 
Typically, stringent conditions will be those in which the 
salt concentration is at least about 0.02 molar at pH 7 and 
the temperature is at least about 60 '•C. As other factors may 
significantly affect the stringency of hybridization, 
including, among others, base composition and size of the 
complementary strands, the presence of organic solvents and 
the extent of base mismatching, the coiabination of parameters 
is more important than the absolute measure of any one. 
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A polymorphism is a condition in which two or more 
different nucleotide sequences coexist in the same 
interbreeding population in a DNA sequence. 

The term "oligonucleotide" refers to a molecule 
5 comprised of two or more deoxyribonucleotides or 

ribonucleotides ; such as primers, probes, nucleic acid 
fragments to be detected, and nucleic acid controls. The 
exact size of an oligonucleotide depends on many factors and 
the ultimate function or use of the oligonucleotide. 

10 Oligonucleotides can be prepared by any suitable method, 

including, for example, cloning and restriction of appropriate 
sequences and direct chemical synthesis by a method such as 
the phosphotriester method of Narang et al., Meth. Enzymol. 
68:90-99 (1979); the phosphodiester method of Brown et al., 

15 Meth. Enzymol. 68:109-151 (1979); the diethylphosphoramidite 
method of Beaucage et al., TetraJiedron Lett. 22:1859-1862 
(1981); and the solid support method of U,S, Patent No. 
4,458,066. 

A primer is an oligonucleotide, whether natural or 

20 synthetic, capable of acting as a point of initiation of DNA 
synthesis under conditions in which synthesis of a primer 
extension product complementary to a nucleic acid strand is 
induced, i.e., in the presence of four different nucleoside 
triphosphates and an agent for polymerization (i.e., DNA 

25 polymerase or reverse transcriptase) in an appropriate buffer 
and at a suitable temperature. 

"Probe" refers to an oligonucleotide which binds 
through complementary base pairing to a subsequence of a 
target nucleic acid- Probes will typically hybridize to 

30 target sequences lacking complete complementarity with the 

probe sequence on reducing the stringency of the hybridization 
conditions. The probes are preferably directly labelled as 
with isotopes or indirectly labelled such as with biotin to 
which a streptavidin complex may later bind. By assaying for 

35 the presence or absence of the probe, one can detect the 
presence or absence of the target • 

"Subsequence" refers to a sequence of nucleic acids 
that comprise a part of a longer sequence of nucleic acids. 
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The term "target region" refers to a region of a 
nucleic acid to be analyzed such as a polymorphic region. 

Hybridization refers to binding between an 
oligonucleotide and a target sequence via complementary base 
5 pairing to achieve the desired priming by PCR polymerases or 
detection of hybridization signal, and sometimes embraces 
minor mismatches that can be accommodated by reducing the 
stringency of the hybridization conditions. 

10 DESCRIPTION OF THE SPECIFIC EMBODIMENTS 

The invention provides novel cytochrome P450 2C 
polypeptides, DNA fragments encoding these polypeptides and 
cell lines expressing the polypeptides- The invention also 
provides methods of using the novel polypeptides for, inter 

15 .alia, identifying drugs metabolized by S-mephenytoin 4'- 
hydroxylase activity. 

I. Polypeptides 

In one embodiment, the invention provides novel 

20 cytochrome P450 2C polypeptides, designated 2C18 and 2C19. 

The 2C18 and 2C19 proteins are nonallelic with each other and 
with known 2C polypeptides. An exemplary 2C19 polypeptide has 
the amino acid sequence designated SEQ. ID. No. 1. The 
invention also provides allelic variants of the exemplified 

25 2C19 polypeptide, and natural and induced mutants of such 

variants. The invention provides human 2C19 polypeptides and 
cognate variants thereof. Typically, 2C19 variants exhibit at 
substantial sequence identity (e.g., at least 96% or 97% amino 
acid sequence identity) with the exemplified 2C19 polypeptide 

3 0 and cross-react with antibodies specific to this polypeptide - 
2C19 variants are usually encoded by nucleic acids that show 
substantial sequence identity (e.g., at least 96% or 97% 
sequence identity) with the nucleic acid encoding the 
exemplified 2C19 variant (SEQ. ID. No. 2). 

35 Some 2C19 polypeptides, including the exemplified 

polypeptide, exhibit high levels of stereospecif ic S- 
mephenytoin 4 •-hydroxylase activity. See Table IV. Indeed, 
it is highly probable that 2C19 represents the principal human 
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determinant of this activity. Typically such 2C19 
polypeptides exhibit a stereospecif ic S-mephenytoin 4'- 
hydroxylase activity of about 0.5-100, 1-10 or about 4-6 nmol 
S-mephenytoin per nmol 2C19 polypeptide per minute. 
5 Frequently, the activity of 2C19 polypeptides is higher than 
of native human liver microsomes. The activity of such 
polypeptides for the R-enantiomer of mephenytoin is typically 
at least 10, 50 or 100-foid lower. 

other 2C19 polypeptides may lack substantial 

10 stereospecif ic S-mephenytoin 4 » -hydroxylase activity. Such 
polypeptides represent allelic variants of the exemplified 
2C19 polypeptide. These polypeptides sometimes exhibit low 
levels of mephenytoin 4 ' -hydroxylase activity (i.e., less than 
about 0.5 or 0.2 nmol mephenytoin per nmol 2C19 polypeptide 

15 per minute) . This activity may, or may not be, 

stereospecif ic. Although the presence of a 2C19 polypeptide 
with low enzymic activity could account for the phenotype of a 
few individuals defective in S-mephenytoin 4 ' -hydroxylase 
activity, the phenotype in most such individuals results from 

20 a complete or substantial absence of 2C19 polypeptide. See, 
e.g. , Figure 10. 

The invention also provides 2 CIS polypeptides. The 
amino acid sequences of two allelic variants of 2C18 are 
designated SEQ. ID. Nos. 5 and 11. Also provided are allelic 

25 variants of the exemplified 2C18 polypeptides, conjugated 

variants thereof, and natural and induced mutants of any of 
these. Typically, 2C18 variants exhibit substantial sequence 
identity {e.g., at least 96% or 97% amino acid sequence 
identity) with the exemplified 2C18 polypeptides and cross- 

30 react with antibodies specific to these polypeptides. 2C18 
variants are usually encoded by nucleic acids that show 
substantial sequence identity (e.g., at least 96% or 97% 
sequence identity) with the nucleic acid encoding the 
exemplified 2C18 variants (SEQ. ID. Nos, 6 and 12) . 

35 2C18 polypeptides typically show low levels of 

mephenytoin 4 '-hydroxylase activity (0.01-0.2 nmol mephenytoin 
per nmol 2C18 polypeptide per min. For some 2C18 
polypeptides, the activity shows a small degree of 
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stereoselectivity (up to about five fold). However, by 
contrast to the 2C19 polypeptides, such stereoselectivity as 
is shown by 2C18 polypeptides is in favor of the R enantiomer. 
Some variants of 2C18 show high levels of a distinct enzymic 
5 activity, namely, tolbutamide hydroxylase activity (e.g., 

about 50-200 pmol tolbutamide per nmol 2C18 polypeptide per 
min) . Conceivably, some variants of 2C18 exhibit novel 
enzymic or regulatory functions not shared by other 2C family 
members • 

10 Besides substantially full-length polypeptides, the 

present invention provides fragments of full-length 2C18 and 
2C19 polypeptides. Some such fragments share the enzymic 
activity of a full-length fragment. A segment of a full- 
length 2C18 or 2C19 polypeptide will ordinarily comprise at 

15 least 50 contiguous amino acids and more usually, 100, 200 or 
400 contiguous amino acids from one of the exemplified 
polypeptide sequences, designated SEQ. ID. Nos. 1, 5 and 11. 
Fragments of full-length 2C18 and 2C19 polypeptides are often 
terminated at one or both of their ends near (i.e., within 

20 about 5, 10 or 20 aa of) the boundaries of functional or 
structural domains. Fragments are useful for, inter alia, 
generating antibodies specific to a 2C19 or 2C18 polypeptide. 
Fragments consisting essentially of the hypervariable regions 
of these polypeptides are preferred immunoglobulins for 

25 generating antibodies specific to a particular allelic 
varicmt. 

II. Nucleic Acid Fragments 

In another aspect of the invention, nucleic acids 
30 fragments are provided. An exemplified cDNA sequence of a 
2C19 polypeptide is designated SEQ. ID. No. 2. Exemplified 
cDNA sequences encoding two variant 2C18 polypeptides are 
designated SEQ. ID. Nos. 6 and 12. The exemplified sequences 
include both translated regions and 3 • and 5 • flanking 
35 regions. The exemplified sequence data can be used to design 
probes for other DNA fragments encoding 2C18 or 2C19 
polypeptides, (or fragments thereof). These DNA fragments 
include human genomic clones, cDNAs and genomic clones from 
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other species, allelic variants, and natural and induced 
mutants of any of these. Specifically, all nucleic acid 
fragments encoding all 2C18 and 2C19 polypeptides disclosed in 
this application are provided. Genomic libraries of many ^ 
5 species are commercially available (e.gr., Clontech, Palo Alto, 
CA) , or can be isolated de novo by conventional procedures. 
cDNA libraries are best prepared from liver extracts. 

The probes used for isolating clones typically 
comprise a sequence of about at least 15, 20 or 25 contiguous 

10 nucleotides (or their complement) of an exemplified DNA 

sequence (i.e., SEQ. ID. Nos. 2, 6 or 12). Preferably probes 
are selected from regions of the exemplified sequences that 
show a high degree of variation between different 2C 
nonallelic variants. Hypervariable regions are the nucleic 

15 acids encoding amino acids 181-210, 220-248, 283-296 and 461- 
479, Probes from these regions are likely to hybridize to 
allelic variants but not to nonallelic variants of the 
exemplified sequences under stringent conditions. Allelic 
variants can be isolated by hybridization screening of plaque 

20 lifts (Benton & Davis, Science 196:180 (1978) • Alternatively, 
cDNAs can be prepared from liver mRNA by polymerase chain 
reaction (PGR) methods. 5*- and 3'- specific primers for 2C19 
are dejpigned based on the nucleotide sequence designated SEQ. 
ID. No. 2. See generally PCR Technology: Principles and 

25 Applications for DNA Amplification (ed. H.A. Erlich, Freeman 
Press, NY, NY, 1992); PCR Protocols: A Guide to Methods and 
Applications (eds. Innis, et al., Academic Press, San Diego, 
CA, 1990); Mattila et al. , Nucleic Acids Res. 19:4967 (1991); 
Eckert et al., PCR Methods and Applications 1:17 (1991); PCR 

3 0 (eds. McPherson et al., IRL Press, Oxford); and U.S.. Patent 

4,683,202 (each of which is incorporated by reference for all 
purposes) . 

Nucleotide substitutions, deletions, and additions 
can be incorporated into the polynucleotides of the invention. 
35 Nucleotide sequence variation may result from degeneracy of 

the genetic code, from sequence polymorphisms of 2C18 and 2C19 
alleles, minor sequencing errors, or may be introduced by 
random mutagenesis of the encoding nucleic acids using 
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irradiation or exposure to EMS, or by changes engineered by 
site-specific mutagenesis or other techniques. See Sambrook 
et al,, Molecular Cloning: A Laboratory Manual (C.S.H.P. 
Press, NY 2d ed., 1989) (incorporated by reference for all 
5 purposes) . 

III. Cell Lines 

In another embodiment of the invention, cell lines 
capable of expressing the nucleic acid segments described 

10 above are provided. Stable cell lineis are preferred to cell 
lines conferring transient expression. Stable cell lines can 
be passaged at least fifty times without reduction in the 
level of 2C polypeptides expressed by the cell lines. 
Preferably, cell lines are capable of being cultured so as to 

15 express 2C polypeptides at high levels, usually at least 0.2, 
1, 10, 20, 50, 100, 200 or 500 pmol of 2C polypeptide per mg 
of microsomal protein. For example, the 2C19 expression level 
of many cell lines of the invention is typically about 0.2- 
10,000, 1-200, 7-100, 10-50 or 10-20 pmol 2C19 polypeptide per 

20 mg microsomal protein. An expression level of 10 pmol 2C19 
per mg microsomal protein means that 2C19 represents about 
0-06% of total cellular protein. For E. coli and insect cell 
lines, the recombinant P450 protein can comprise 5-10% of 
total cellular protein. Often, the stable cell lines of the 

25 invention express more than one P450 polypeptide. These cell 
lines express 2C18 and/or 2C19 together with other members of 
the 2C family, or other P450 cytochromes such as lAl, 1A2, 
2A6, 3A3, 3A4, 2B6, 2B7 , 2C9, 2D6, and/or 2E1. 

E. coli is one prokaryotic host useful for cloning 

30 the polynucleotides of the present invention. Other microbial 
hosts suitable for use include bacilli, such as Bacillus 
suhtilus, and other enterobacteriaceae, such as Salmonella, 
Serratia, and various Pseudojnonas species. Expression vectors 
typically contain expression control sequences compatible with 

35 the host cell, e.g., an origin of replication, any of a 

variety of well-known promoters, such as the lactose promoter 
system, a tryptophan (trp) promoter system, a beta-lactamase 
promoter system, or a promoter system from phage lambda. 
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Vectors often also contain an operator sequence and/or a 
ribosome binding site. The control sequences are operably 
linked to a P450 DNA segment so as to ensure its 
expression. and control the expression thereof. 
5 Other microbes, such as fungi, particularly, yeast, 

are particularly useful for expression. Saccharomyces is a 
preferred host, with suitable vectors having expression 
control sequences, such as promoters, including 3- 
phosphoglycerate kinase or other glycolytic enzymes, and an 

10 origin of replication, termination sequences and the like as 

desired. For example, the plasmid pAAH5 can be used. The 5'- 
noncoding sequence of the P450 2C cDNAs can be eliminated and 
six adenosines added by polymerase chain reaction (PGR) 
amplification to optimize expression in yeast cells. The 

15 and 3 '-primers recommended for amplification of 2C18 are 5'- 

GCAAGCTTAAAAAATGGATCCAGCTGTGGCTCT-3 ' (SEQ. ID. No. 15) and 5 ' - 
GCAAGCTTGCCAAACTATCTGCCCTTCT-3 ' (SEQ. ID. No. 16). This 
includes addition of a Hind III restriction site at both ends 
to allow insertion into the pAAH5 vector and six 6 adenosines 

20 at the 5 '-end to optimize translation. The final 20 bases of 
each sequence is specific for 20 bases at the 5 '-end of 2C18 
starting with the ATG for methionine and 2 0 bases of the 3'- 
noncoding region. The primers for 2C19 can be constructed 
similarly. The yeast strain used, Saccharomyces cerevisiae 

25 334, can be propagated non-selectively in YPD medium (1% yeast 
extract, 2% peptone, 2% dextrose (Hovland et al. (1989) Gene 
83, 57-64) and Leu+ transf ormants selected on synthetic 
minimal medium containing 0.67% nitrogen base (without amino 
acids), 0.5% ammonium sulfate, 2% dextrose and 20 /xg/ml L 

30 histidine (SD+His) . Plates are made by the addition of 2% 

agar. Yeast can be transformed by the lithium acetate method 
of Ito et al. (1983) J. Bacterial. 153, 163 and selected on 
SD+His for selection of transf ormants. Cells are then grown 
to mid-logarithmic phase (Oeda et al., DNA 4:203-210 (1985)) 

35 and microsomes containing recombinant protein can be prepared. 

Insect cells (e.g., SF9) with appropriate vectors, 
usually derived from baculovirus, are also suitable for 
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expressing 2C polypeptides. See Luckow, et al. Bio/Technology 
6:47-55 (1988) (incorporated by reference for all purposes). 

Mammalian tissue cell culture can also be used to 
express and produce the polypeptides of the present invention 
5 (see Winnacker, From Genes to Clones (VCH Publishers, N.Y., 

N.Y., 1987). Suitable host cell lines include CHO cell lines 
{e.g., V79) (Dogram et al.. (1990) Mol. Pharmacol. 21, 607- 
613), various COS cell lines, HeLa cells, myeloma cell lines 
and Jurkat cells, hepatoma cell lines (Hep G2) , and a 

10 lymphoblastoid cell line AHH-1 TK+/-. Crespi et al. (1991) 
Carcinogenesis 12, 355-359. Expression vectors for these 
cells (e.g., pEBVHistK or pSV2) can include expression control 
sequences, such as an origin of replication, a promoter (e.g., 
a HSV tk promoter or pgk (phosphogly cerate kinase promoter) , 

15 an enhancer (Queen et al. , Immunol. Rev, 89:49 (1986)), and 
necessary processing information sites, such as ribosome 
binding sites, RNA splice sites, polyadenylation sites {e.g., 
an SV40 large T Ag poly A addition site) , and transcriptional 
terminator sequences. Preferred expression control sequences 

20 are promoters derived from immunoglobulin genes, SV40, 

adenovirus, bovine papillomavirus, and the like. Expression 
control sequences are operably linked to a DNA segment 
encoding a P450 polypeptide so as to ensure the polypeptide is 
expressed. 

25 The vectors containing the polynucleotide sequences 

of interest can be transferred into the host cell by well- 
known methods, which vary depending on the type of cellular 
host. For example, calcium chloride transfection is commonly 
utilized for prokaryotic cells, whereas calcium phosphate 

30 treatment or electroporation may be used for other cellular 
hosts. (See generally Sambrook et al., Molecular Cloning: A 
Laboratory Manual (Cold Spring Harbor Press, 2nd ed. , 1989) 
(incorporated by reference in its entirety for all purposes). 

Once expressed, the polypeptides of the invention 

35 and their fragments can, if desired, be purified according to 
standard procedures of the art, including ammonixim sulfate 
precipitation, affinity colximns, column chromatography, gel 
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electrophoresis and the like (see ofenerally Scopes, Protein 
Purification (Springer-Verlag, N.Y., 1982). 

IV. Antibodies 

5 The invention also provides antibodies that 

specifically bind to epitopes on the 2C18 and 2C19 
polypeptides of the invention. Some antibodies specifically 
bind to one member of the 2C family {e.g., 2C19) without 
binding to nonallelic forms. Some antibodies specifically 

10 bind to a single allelic form of a 2C member such as the 2C19 
polypeptide having the amino acid sequence designated SEQ. ID. 
No, 1. Antibodies that specifically bind to a 2C19 
polypeptide without binding to a 2C9 polypeptide are 
particularly useful in view of the relatively high degree of 

15 sequence identity between these nonallelic variants. See 

Table II. The production of non-human monoclonal antibodies, 
e.gr. , murine, lagomorpha, equine is well known and can be 
accomplished by, for example, immunizing an animal with a 
preparation containing a 2C19 polypeptide or an immunogenic 

20 fragment thereof. Human antibodies can be prepared using 
phage-display technology. See, e.g., Dower et al., WO 
91/17271 and McCafferty et al., WO 92/01047 (each of which is 
incorporated by reference in its entirety for all purposes) . 
Humanized antibodies are prepared as described by Queen et 

25 al. , WO 90/07861. 

V. Methods of Use 

A. Identification of Drugs Unsuitable for 
Administration to Poor Metabolizers of S-Mephenvtoin 

30 The identification of a 2C19 polypeptide as the 

-principal determinant of human S-mephenytoin 4 ' -hydroxylase 
activity facilitates methods of screening drugs that are 
metabolized by this enzyme. Such drugs likely lack efficacy 
and/ or show intolerable side effects in individuals having a 

35 defect in S-mephenytoin 4 • -hydroxylase activity (low 

producers) . The substantial absence of this activity in low 
producers often results in an inability to detoxify such 
drugs, preventing their elimination from the body. 
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Substantial absence of S-mephenytoin 4 ' -hydroxylase activity 
can also prevent metabolic processing of certain drugs to 
activated forms. Drugs suspected of being metabolized by S- 
mephenytoin 4 ' -hydroxylase activity include, in addition to 
5 mephenytoin itself, omeprazole, proguanil, diazepam and 
certain barbiturates. 

Drugs are screened for metabolic processing by S- 
mephenytoin 4 ' -hydroxylase activity in a variety of assays. 
See Example 5. In brief, the drug under test is usually 

10 labelled with a radioisotope or otherwise. The drug is then 

contacted with a 2C19 polypeptide exhibiting S-mephenytoin 4'- 
hydroxylase activity (e.g., the polypeptide designated SEQ. 
ID. No. 1). The 2C19 polypeptide can be in purified form or 
can be a component of a lysate of one of the cell lines 

15 discussed in Section III. Often, the 2C19 polypeptide is part 
of a microsomal fraction of a cell lysate. The 2C19 
polypeptide can also be a component of an intact cell as many 
drugs are taken up by such cells. Often, the reaction mixture 
is supplemented with one or more of the following reagents: 

20 dilauroylphosphatidylcholine, cytochrome P450 reductase, hximan 
cytochrome b5, and NADPH. (See Example 5, for concentrations 
of these reagents and a suitable buffer) - After an incubation 
period (e.g., 30 min) , the reaction is terminated, and 
centrifuged. The supernatant is analyzed for metabolic 

25 activity, e.g., by a spectrographic or chromatographic method. 
The assay is usually performed in parallel on a control 
reaction mixture without a 2C19 polypeptide. Metabolic 
activity is shown by a comparative analysis of supernatants 
from the test and control reaction mixtures. For example, a 

3 0 shift in retention time of radiolabelled peaks between test 

and control under HPLC analysis indicates that the drug under 
test is metabolized by S-mephenytoin 4 '-hydroxylase activity. 
Often, the test is repeated using an extract from human liver 
in place of the 2C19 polypeptide. The appearance of a 

35 labelled metabolic peak from the reaction using 2C19 

recombinant organisms or 2C19 recombinant cell fractions 
having the same HPLC retention time, and a specific activity 
at least as high, as that observed for human liver microsomes 
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provides strong evidence that S-mephenytoin 4 -hydroxylase 
activity plays a major role in processing the drug. The test 
can also be repeated using other 2C members, such as 2C18, as 
controls, in place of 2C19. 
5 Drugs can also be screened for metabolic dependence 

on S-mephenytoin 4 ' -hydroxylase activity in transgenic 
nonhuman animals. Some such animals have genomes comprising a 
2C19 transgene (e.g., SEQ. ID. No. 2) operably linked to 
control sequences so as to render the transgene capable of 

10 being expressed in the animals. Other transgenic animals have- 
a genome containing homozygous null mutations of endogenous 
2C19 genes. Mice and other rodents are particular suitable 
for production of transgenic animals. Drugs are administered 
to transgenic animals in comparison with normal control 

15 animals and the effects from administration are monitored. 

Drugs eliciting different responses in the transgenic animals 
than the control animals likely require S-mephenytoin 4'- 
hydroxylase activity for detoxification and/ or activation. 

Drugs identified by the above screening methods as 

20 being metabolized by S-mephenytoin 4 '-hydroxylase activity 

should generally not be administered to individuals known to 
be deficient in this enzyme, or should be administered at 
different dosages. Indeed, in the absence of data on an 
individual patient's S-mephenytoin 4-hydroxylase phenotype, it 

25 is often undesirable to administer such drugs to any member of 
an ethnic group known to be at high risk for S-mephenytoin 4- 
hydroxylase deficiency (e.g.. Orientals and possibly blacks). 
If it is essential to administer drugs identified by the above 
screening procedures to individuals known to be at risk of 

30 enzymic deficiency (e.g., no alternative drug is available), a 
treating physician is at least apprised of a need for vigilant 
monitoring of the patient's response to the drug. In general, 
the identification of a new drug as a substrate for 2C19 would 
mitigate against further development of the drug. 



35 
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B, Screening Contpounds for Mutage nic. Cytotoxic or 
Carcinogenic Activity 

The invention provides methods of measuring the 
mutagenic, cytotoxic or carcinogenic potential of a compound. 
5 In some methods, mutagenic, cytotoxic or carcinogenic effects 
are assayed directly on a cell line harboring one or more 
recombinant cytochrome P450 enzymes. In these methods, a 
compound under test is added to the growth medium of a cell 
line expressing 2C19, and/or 2C18 and/or other cytochrome 

10 P450S. Often,' one or more of the reagents discussed in 
Section V(I) , supra, is also added. After a suitable 
incubation, mutagenic, cytotoxic or carcinogenic effects are 
assayed. Mutagenic effects are assayed, e.g., by detection of 
the appearance of drug-resistant mutant cell colonies 

15 (Thompson, Methods Enzymol., 58:308, 1979). For example, 

mutagenicity can be evaluated at the hgprt locus (Penman et 
al., (1987) Environ, Mol. Mutagenesis 10, 35-60). 
Cytotoxicity can be assayed from viability of the cell line 
harboring the P450 enzyme (s). Carcinogenicity can be assessed 

20 by determining whether the cell line harboring the P450 
enzymes has acquired anchorage-independent growth or the 
capacity to induce tixmors in athymic nude mice. 

In ottler methods, a suspected compound is assayed in 
a selected test cell line rather than a cell line harboring 

25 P450 enzymes. In these methods, the compound under test is 
contacted with P450 2C19 and/or 2C18 and/or other P450 
enzymes. The P450 enzyme (s) can be provided in purified form, 
or as components of lysates or microsomal fractions of cells 
harboring the recombinant enzyme(s). The P450 enzyme(s) can 

30 also be provided as components of intact cells. Usually, one 
or more of the reagents discussed in Section V(l) , supra, is 
also added. Optionally, the appearance of metabolic products 
from the suspected compound can be monitored by techniques 
such as thin layer chromatography or high performance liquid 

35 chromatography and the like. 

The metabolic products resulting from treatment of 
the suspected compound with P450 enzyme (s) are assayed for 
mutagenic, cytotoxic or carcinogenic activity in a test cell 
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line. The test cell line can be present during the metabolic 
activation of the mutagen or can be added after activation has 
occurred. Suitable test cell lines include a mutant strain of 
Salmonella typhimurium bacteria having auxotrophic histidine 
5 mutations (Ames et al., Mut. -Res. 31:347-364 (1975). Other 
standard test cell lines include Chinese hamster ovary cells 
(Galloway et al., Environ, Mutagen. 7:1 (1985); Gulati et al., 
(Environ. Mol. Afutagrenesis 13:133-193 (1989)) for analysis of 
chromosome aberration and sister chromatic exchange induction, 

10 and mouse lymphoma cell (Myhr et al.,- Prog. Mut. Res. 5:555-. 
568; (1985)). 

The use of defined P450 enzymes for activation of 
compounds in the present methods offers significant advantages 
over previous methods in which rat or humin S 9 -supernatant 

15 liver fractions (containing an assortment of P450 enzymes) 

were used. The present methods are more reproducible and also 
provide information on the mechanisms by which mutagenesis, 
cytotoxicity and carcinogenicity are effected. 

20 C. Identification of Potential Chemooreventive 

Drugs 

The invention also provides methods for identifying 
drugs having chemopreventive activity. These methods employ 
similar procedures to those discussed in paragraph (2) above 

25 except that the methods are performed using a known mutagenic, 
cytotoxic or carcinogenic agent, together with a suspected 
chemopreventive agent. Mutagenic, cytotoxic or carcinogenic 
effects in the presence of the chemopreventive agents are 
compared with those in control experiments in which the 

30 chemopreventive agent is omitted. 

D. Screening for Potential Chemotherapeutic Drugs 
The invention provides analogous methods to those 
described in paragraph (2), supra, for screening 
35 chemotherapeutic agents. In some methods, chemotherapeutic 
activity is determined directly on a tumorigenic cell line 
expressing 2C19 and/ or 2C18 and or other cytochrome P450 
enzymes. In other methods, chemotherapeutic activity is 
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determined on a tumorigenic test cell line. Chemother apeutic 
activity is evidenced by reversion of the transformed 
phenotype of cells resulting in reduced 50bb agar growth or 
reduced tumor formation in nude mice. 

5 

E. Programmed Cell Death 

The invention provides analogous methods to those 
described in paragraph (2), supra, for identifying agents that 
induce programmed cell death or apoptosis. Apoptosis may have 
10 an important impact on prevention of malignant transformation. 
Programmed cell death is assayed by DNA fragmentation or cell- 
surface antigen analysis. 



F- Monitoring 2C18 and 2C19 Polvpeutides 
15 The invention provides methods of quant itating the 

amount of the specific protein in mammalian tissues by 
measuring the complex formed between the antibody and proteins 
in the tissue. For example, a biological sample is contacted 
with an antibody under conditions such that the antibody binds 
20 to specific proteins forming an antibody: protein complex which 
can be quantitatively detected. 

VI. Diagnosing 2C19 and 2C18 Polymorphisms 
Diagnostic Assays for Identifying Individuals Deficient in S- 

25 Mephenvtoin 4 '-Hydroxylase 

The invention provides a variety of assays for 
identifying individuals deficient in s-mephenytoin 4»- 
hydroxylase activity. Such individuals comprise about 3-5% of 
Caucasian populations and about 20% of Orientals and possibly 

30 blacks. Identification of individuals' deficient in S- 

mephenytoin 4 ' -hydroxylase activity is important in selecting 
appropriate drugs for treatment of these individuals. 
Usually, drugs that are metabolized by S-mephenytoin 4'- 
hydroxylase should not be administered to these individuals. 

35 The assays diagnose mutations in cDNA or genomic DNA encoding 
2C19, which as discussed above, is the principal human 
determinant of S-mephenytoin 4 ' -hydroxylase activity. The 
cDNA assays are particularly useful for de novo localization 
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of a 2C19 mutation to a particular nucleotide or nucleotides. 
The genomic assays are particularly useful for large-scale 
screening of individuals for the presence of a mutation that 
has previously been localized. 

5 

A, Amplification Technologies 

Many of the diagnostic assays rely on amplification 
of part or all of a DNA segment encoding a 2C19 polypeptide 
(e.g., a 2C19 gene). In a preferred embodiment, target 

10 segments encoding a 2C19 polypeptide are amplified by the 

polymerase chain reaction. The PCR process is described in 
e.g., U.S. Patent Nos. 4,683,195; 4,683,202; and 4,965,188; 
PCR Technology: Principles and Applications for DNA 
Amplification (ed. Erlich, Freeman Press, New York, NY, 1992); 

15 PCR Protocols: A Guide to Methods and Applications (eds. Innis 
et al., Academic Press, San Diego, CA (1990); Mattila et al. 
Nucleic Acids Res, 19:4967 (1991); Eckert & Kunkel PCR Afethods 
and Applications 1:17 (1991); PCR (eds. McPherson et al., IRL 
Press, Oxford) (each of which is incorporated by reference in 

20 its entirety for all purposes). Reagents, apparatus and 

instructions for using the same are commercially available 
(e.g., from PECI) . Other amplification systems include, 

I ligase chain reaction, QB RNA replicase and RAN-transcription- 
based amplification systems. 

25 To amplify a target nucleic acid sequence in a 

sample by PCR, the sequence must be accessible to the 
components of the amplification system. Accessibility can be 
achieved by isolating the nucleic acids from the sample. A 
variety of techniques for extracting nucleic acids from 

3 0 biological samples are known in the art. Alternatively, if 
the sample is fairly readily disruptable, the nucleic acid 
need not be purified prior to amplification by the PCR 
technique, i.e., if the sample is comprises cells, 
particularly peripheral blood lymphocytes or monocytes, lysis 

35 and dispersion of the intracellular components may be 

accomplished merely by suspending the cells in hypotonic 
buffer. See Han et al. Biochemistry 26:1617-1625 (1987). 



wo 95/30766 



PCT/US95/05744 



34 

For amplification of mRNA sequences, a first step is 
the synthesis of a DNA copy (cDNA) of the region to be 
amplified by reverse transcription. Reverse transcription is 
the polymerization of deoxynucleoside triphosphates to form 
5 primer extension products that are complementary to a 

ribonucleic acid template. The process is effected by reverse 
transcriptase, an enzyme that initiates synthesis at the 3 
end of the primer and proceeds toward the 5 '-end of the 
template until synthesis terminates. Examples of suitable 

10 polymerizing agents that convert the RNA target sequence into 
a complementary, copy-DNA (cDNA) sequence are avian 
myeloblastosis virus reverse transcriptase and Thermus 
thermophilous DNA polymerase, a thermostable DNA polymerase 
with reverse transcriptase activity marketed by FECI. Reverse 

15 transcription can be carried out as a separate step, or in a 
homogeneous reverse transcription-polymerase chain reaction 
(RT-PCR) . Polymerizing agents suitable for synthesizing a 
complementary, copy-DNA (cDNA) sequence from the RNA template 
are reverse transcriptase (RT) , such as avian myeloblastosis 

20 virus RT, Moloney murine leukemia virus RT, or Thermus 
thermophilous (Tth) DNA polymerase, a thermostable DNA 
polymerase with reverse transcriptase activity marketed by 
PECI . I 

The first step of each amplification cycle of the 

25 PGR involves the separation of the nucleic acid duplex formed 
by the primer extension. Strand separation is achieved by 
heating the reaction to a sufficiently high temperature for an 
sufficient time to cause the denaturation of the duplex but 
not to cause an irreversible denaturation of the polymerase 

30 (see U.S. Patent No. 4,965,188). Typical heat denatxiration 
involves temperatures ranging from about 80 '^C to 105 *»€ for 
times ranging from seconds to minutes. Typically, any initial 
RNA template is also degraded during the denaturation step 
leaving only DNA template. Other means of strand separation, 

35 including physical, chemical, or enzymatic means, are also 
possible. 

Once the strands are separated, the next step 
involves hybridizing the separated strands with primers that 
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flank the target sequence. The primers are then extended to 
form complementary copies of the target strands. Template- 
dependent extension of primers in PGR is catalyzed by a 
polymerizing agent in the presence of adequate amounts of four 
deoxyribonucleotide triphosphates (typically dATP, dGTP, dCTP, 
and dTTP) in a reaction medium comprised of the appropriate 
salts, metal cations, and pH buffering system. Suitable 
polymerizing agents include, for example, E. coli DNA 
polymerase I or its Klenow fragment, T4 DNA polymerase, Tth 
polymerase, and Tag polymerase, a heat-stable DNA polymerase 
isolated from Thermus aquaticus commercially available from 
Perkin-Elmer Cetus Instruments (PECI, Norwalk, CT) • See U.S. 
Patent No. 4,889,818. See Gelfand, 1989 in PCR Technolo^, 
supra. The polymerizing agents initiate synthesis at the 3 '- 
end of the primer and proceeds toward the 5 '-end of the 
template until synthesis terminates. 

The primers are designed so that the position at 
which each primer hybridizes along a duplex sequence is such 
that an extension product synthesized from one primer, when 
separated from the template (complement) , serves as a template 
for the extension of the other primer. The cycle of 
denaturation, hybridization, and extension is repeated as many 
times as necessary to obtain the desired amount of amplified 
nucleic acid. 

The primers are selected to be sxibstantially 
complementary to the different strands of each specific 
sequence to be amplified. This means that the primers must be 
sufficiently complementary to hybridize with their respective 
strands. Therefore, the primer sequence need not reflect the 
exact sequence of the template. For example, a non- 
complementary nucleotide fragment may be attached to the 5* 
end of the primer with the remainder of the primer sequence 
being complementary to the strand. Alternatively, 
complementary bases or longer sequences can be interspersed 
into the primer, provided that the primer sequence has 
sufficient complementarity with the sequence of the strand to 
be amplified to hybridize therewith and thereby form a 
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template for synthesis of the extension product of the other 
primer. 

Paired primers for amplification of a given segment 
of DNA are designated forward and reverse primers. 
5 Conventionally, the orientation of a double-stranded DNA 

molecules is that of the sense (or coding strand) , with the 
5 '-terminus of the coding strand being drawn on the left (see, 
e.g., Fig, 15). Under this convention, the forward primer 
hybridizes to a double-stranded DNA molecule at a position 5 ' 

10 (or upstreaun) from the reverse primer. The forward primer 
hybridizes to the complement of the coding strand of the 
double stranded sequence (i.e., the antisense strand) and the 
reverse primer hybridizes to the coding strand. 

The appropriate length of a primer depends on the 

15 intended use of the primer but typically ranges from 10-100, 
15-50, 15-30, or more usually, 15 to 25 nucleotides. Shorter 
primers tend to lack specificity for a target nucleic acid 
sequence and generally require cooler temperatures to form 
sufficiently stable hybrid complexes with the template. 

20 Longer primers are expensive to produce and can sometime self- 
hybridize to form hairpin structures. 

The spacing of primers determines the length of 
segment to be amplified. The spacing is not usually critical 
and amplified segments can range in size from about 25 bp to 

25 at least 35 kbp. Segment from 25-2000, 50-1000, 100-500 bp or 
about 400 bp are typical. For larger segments, difficulties 
may occasionally be encountered in obtaining efficient and 
accvirate amplification. For smaller segments, analysis of 
amplification products may be more difficult. 

30 The primer can be labelled, if desired, by 

incorporating a label detectable by spectroscopic, 
photochemical, biochemical, immunochemical, or chemical means. 
For example, useful labels include ^^P, fluorescent dyes, 
electron-dense reagents, enzymes (as commonly used in an 

35 ELISA) , biotin, or haptens and proteins for which antisera or 
monoclonal antibodies are available. A label can also be used 
to "capture" .the primer, so as to facilitate the 
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immobilization of either the primer or a primer extension 
product, such as amplified DNA, on a solid support • 



5 



B, Tissue Sample for Analysis 

The diagnostic assays are performed on a tissue 



sample containing a nucleic acid encoding a 2C19 polypeptide. 
For assay of genomic DNA, virtually any tissue sample (other 
than pure red blood cells) is suitable. For example, 
convenient tissue samples include whole blood, buccal, skin 

10 and hair. For assay of cDNA, the tissue sample must be 

obtained from an organ in which a 2C19 gene is expressed, such 
as the liver. Liver samples from dead patients are suitable 
for de novo localization of mutations (see Section C, infra). 
However, for screening of living persons, liver biopsies, 

15 while feasible, are generally undesirable- Thus, for large- 
scale screening of living persons, analysis of genomic DNA is 
preferred. 



specific nucleotides by comparison of nucleic acids from poor 
metabolizing individuals with nucleic acids from extensive 
metabolizers. The comparison can| be initiated directly at the 
genomic level. If intron primers are known, individual exons 
25 and intron/exon junctions of 2C19 can be amplified from 

genomic DMA. These fragments can be sequenced directly or 
analyzed by single-stranded conformational analysis to 
indicate the presence of a polymorphism and then analyzed by 
sequencing. 

30 Comparison is sometimes initiated at the cDNA level 

because of the shorter size of cDNA (about 1750 bp) relative 
to genomic DMA (about 55 kbp) . cDNA is amplified from liver 
samples of individuals known to have phenotypic S-mephenytoin 
metabolic deficiencies, and the cDNA sequence is compared with 

35 the wildtype sequence shown in SEQ. ID. No. 2. Often, the 

full-length cDNA is amplified. An initial comparison can be 
performed by single-stranded conformational analysis to 
indicate the existence of a polymorphism. The polymorphism is 



20 



C. De NOVO Local ization of 2C19 Polymorphisms 
2C19 polymorphisms are identified and localized to 
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then localized by sequence analysis indicating the site of 
mutations in cDNA. Of course, the amplification product can 
also be sequenced directly without prior conformational 
analysis- Having localized a mutation in cDNA, a 
5 corresponding region of genomic 2C19 DNA is amplified. The 
genomic DNA is usually amplified from primers spanning the 
mutation. At least one of the primers for this amplification 
usually comprises a subsequence of the cDNA sequence proximate 
(i.e., within 25-200 bp of the cDNA mutation). Primers can 

10 also comprise subsequences of genomic 2019 DNA that have 
already been sequenced, subsequences from related genomic 
sequences, such as 2C18 or 2C9 (see de Morals et al. , Biochem. 
Biophys. Res, Coimun, 194:194-201 (1993)) (incorporated by 
reference in its entirety for all purposes) , or can be random. 

15 An amplified genomic fragment spanning the portion of the 
coding region in which the cDNA polymorphism occurs is 
sequenced and compared with the corresponding region from a 
2C19 sequence from an individual exhibiting extensive S- 
mephenytoin 4 ' -hydroxylase metabolism to identify the locus of 

20 the genomic mutation. 

In some instances, there will be a simple 
relationship between genomic and cDNA mutations. That is, a 
single base change in a coding region of genomic DNA can give 
rise to a corresponding mutated codon in the cDNA. In other 

25 instances, the relationship between genomic and cDNA mutations 
is more complex. Thus, for example, a single base change in 
genomic DNA creating an aberrant splice site can give rise to 
deletion of a substantial segment of cDNA in a poor 
metabolizing individual. 

30 

p. The 681 and 636 Polvmorphisms 

The principal mutation in individuals deficient in 
the S-mephenytoin 4 ' -hydroxylase activity is designated the 
681 polymorphism. See Example 7. The 681 polymorphism 
35 results from a single-base mutation in genomic 2C19 DNA at 

nucleotide position 681 of the coding region. A nucleotide in 
a coding (i.e., exonic) region of genomic 2C19 DNA is 
designated the same number as the corresponding nucleotide in 
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the cDNA sequence shown in SEQ. ID. No. 2, when the genomic 
coding sequence is maximally aligned with the cDNA sequence. 
The 681 polymorphism results in a G/A transposition at 
nucleotide 681 of the coding region. Homozygous mutations at 
5 this position occur in about 70% of individuals having a low- 
producing (i.e., defective) S-mephenytoin 4 ' -hydroxylase 
phenotype. The mutation is inherited in an autosomal 
recessive fashion. Thus, individuals heterozygous in this 
mutation usually exhibit normal {i.e., extensive s-mephenytoin 

10 activity) . Fortuitously, the mutation confers two distinct 
properties that facilitate its identification. In genomic 
DNA, the polymorphism results in loss of several restriction 
enzyme sites (e.g., Smal) and acquisition of other restriction 
sites (e.g., EcoRII) site in mutant individuals compared with 

15 wildtype individuals. These restriction sites include the 681 
nucleotide. In mJlNA or cDNA, the 681 mutation results in a 
deletion of 40 bp spanning nucleotides 643-682 of the wildtype 
CDNA sequence shown in Fig. 12. The deletion is the 
consequence of an altered splice pattern stemming from the 

20 presence of the 681 polymorphism in genomic DNA. 

A second polymorphism is designated the 636 
polymorphism. See Example 8. The 636 polymorphism results 
from a single-base mutation in genomic 2C19 DNA at nuclleotide 
position 636. The 636 polymorphism results in a G/A 

25 transposition thereby introducing a premature stop codon into 
2C19 mRNA. The mutation is easily be recognized by the loss of 
e.g., a BamHI site in both genomic and cDNA and acquisition of 
e.g., a Hinfl site. The mutation is inherited in an autosomal 
recessive fashion- Homozygous mutations at nucleotide 636 

30 account for about 10% of low-producing phenotypes in 

Orientals. Heterozygous individuals having one allele 
defective in the 636 polymorphism and the other allele 
defective in the 681 polymorphism account for all or nearly 
all of the remaining 15% of low producing Oriental 

35 individuals. Thus, the 681 and 636 polymorphisms collectively 
account for all, or nearly all, low producing phenotypes in 
Orientals. 
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In Caucasians, the 636 polymorphism is less 
prevalent and some low producing individuals probably have a 
mutation at a locus other than nucleotide 681 or 636 of the 
coding sequence. Conceivably, a few mutations might occur in 
other genes that exert regulatory control over the 2C19 gene. 
However, most, if not all, of the remaining mutations probably 
result from additional polymorphisms in the 2C19 gene. 

E. Screening Assays for Defined Mutations 
The invention provides assays that permit large- 
scale screening of individuals for the presence of defined 
mutations. Of course, detection of the 681 and 636 mutations, 
which account for all or nearly all deficiencies in Orientals 
and about 75% of deficiencies in Caucasians, is of primary 
importance. An assay on an individual under test is often 
performed in parallel with control assays on DNA samples from 
subjects of known phenotype (i.e., extensive or poor 
metabolizer of S-mephenytoin) . 

1, Genomic Assays 

Assays are preferably performed on a genomic 
substrate because of the ready availability of tissue samples 
conlJlaining genomic DNA. 

a. Amplification of segments Spanning a 
Defined Mutation 

A preferred strategy for analysis entails 
amplification of a DNA sequence spanning previously localized 
polymorphism (s) (e.g., the 681 and/or 636 polymorphisms). 
Amplification of such a sequence can be primed from forward 
and reverse primers that hybridize to a 2C19 gene on opposite 
sides of a mutation (e.g., the 681 mutation, but which do not 
hybridize to the mutated nucleotide itself) • That is, for 
detection of the 681 polymorphism, the forward primer 
hybridizes upstream or 5' to the 681 nucleotide and the 
reverse primer hybridizes downstream or 3' to this nucleotide. 
Similarly, for detection of the 636 polymorphism, the forward 
primer hybridizes upstream or 5' to the 636 nucleotide and the 
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reverse primer hybridizes downstream or 3 ' to this nucleotide. 
For simultaneous analysis of 63 6 and 681 polymorphisms, the 
forward primer hybridizes upstream or 5 ' to the 636 nucleotide 
and the reverse primer hybridizes downstream or 3 • to 
5 nucleotide 681. 

The forward primer is sufficiently complementary to 
the antisense strand of a 2C19 DNA sequence to hybridize 
therewith and the reverse primer is sufficiently complementary 
to the sense strand of the 2C19 sequence to hybridize 

10 therewith. The primers usually comprise first and second 

subsequences from opposite strands of a double-stranded 2C19 
DNA sequence. Isolated points of mismatch between a primer 
and a corresponding 2 CI 9 subsequence can usually be tolerated 
but are not preferred. It is particularly important to avoid 

15 mismatches in the two nucleotides at the 3 • end of the primer 
(especially the terminal nucleotide) • 

Because allelic variants of 2C19 exhibit at least 
about 97% sequence identity to each other, it is not critical 
which variant is selected as a source of subsequences for 

2 0 incorporation into forward and reverse primers. For example, 
suitable subsequences can be obtained from the genomic 2C19 
sequence defined as wildtype in Figs. 15-17. Fig. 15 provides 
genomic sequence immediately flanking the 681 mutation, and 
Figure 16 provides more distal flanking sequences. Figxire 17 

25 provides genomic sequence flanking the 636 mutation. These 
figures provide sufficient sequence for selection of a 
multitude of paired primers for amplification of a sequence 
spanning the 681 and/or 636 polymorphisms. Although there is 
no apparent advantage for doing so, additional genomic 

30 sequence flanking the regions already sequenced could easily 
be determined by PCR-based gene walking. See Parker et al. , 
Hud. Acids Res. 19:3055-3060. A specific primer for the 
sequenced region is primed with a general primer that 
hybridizes to the flanking region. 

35 Forward primers often comprise about 10-50 and 

preferably 15-3 0 contiguous nucleotides from the wildtype 2C19 
setjuences shown in Figs. 15-17 (which is the coding or sense 
sequence) . Reverse primers often comprise about 10-50 or 15- 
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30 nucleotides from the complement of the wildtype 2C19 
sequence shown in Figs, 15-17. The complement of the sequence 
shown in Figs. 15-17 is also referred to as the antisense 
sequences. A primer (or its complement) preferably exhibits 
5 100% sequence identity with a corresponding 2C19 subsequence 
to which it hybridizes over a window of about 15-30 bp. For 
amplification of the 681 polymorphism, forward primers 
preferably comprise a segment of contiguous nucleotides from 
the fourth intronic region and reverse primers a segment of 

10 contiguous nucleotides from the fifth exonic or intronic 

region. For amplification of the 636 polymorphism, forward 
primers preferably comprise a segment of contiguous 
nucleotides from the third intronic region and reverse primers 
a segment of contiguous nucleotides from the fourth intronic 

15 region. For amplification of both the 636 and 681 

polymorphisms, forward primers preferably comprise a segment 
of contiguous nucleotides from the third intronic region and 
reverse primers a segment of contiguous nucleotides from the 
fifth exonic region or fifth intronic region. See Figure 19, 

20 As noted above, the spacing of the subsequences is not 
critical, but a separation of about 50-2000 bp. For 
simultaneous amplification of the 636 and 681 mutations, the 
spacing is typically 1000-150 j) bp. For amplification of 
either mutation alone, a spacing of about 400 bp is typical. 

25 Preferred primers exhibit perfect sequence identity 

to 2C19 and lesser sequence identity to corresponding regions 
of related genes, such as 2C9 and 2C18- Such primers are 
designed by comparison of the wildtype 2C19 sequence shown in 
Fig. 15-17 with corresponding sequences from 2C9 and 2C18 

30 described by de Morais et al., supra. In general, sequence 
divergence between the three genes is expected to be greater 
in intronic sequences. An exemplary pair of primers for 
amplifying a segment spanning the 681 mutation is described in 
Example 7. A forward primer, 5 '-AATTACAACCAGAGCTTGGC-3 ' (SEQ. 

35 ID. No. 55) , exhibits perfect sequence identity to a 

subsequence from the wildtype 2C19 sense strand within 
intron 4. A reverse primer 5 '-TATCACTTTCCATAAAAGCAAG-3 ' 
((SEQ. ID. No. 56) exhibits perfect sequence identity to the 
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antisense strand of the wildtype 2C19 sequence within exon 5. 
The amplification product from these primers has a length of 
169 bp. An exemplary pair of primers for amplifying a segment 
spanning the 63 6 mutation is described in Example 8. A 
5 forward primer, 5 '-TATTATCTGTTAACTAATATGA-3 ' (SEQ. ID. No. 57) 
exhibits perfect sequence identity to a subsequence from the 
wildtype 2C19 sense strand within intron 3. A reverse primer 
5 '-ACTTCAGGGCTTGGTCAATA-3 ' (SEQ, ID. No. 58) exhibits perfect 
sequence identity to the antisense strand of the wildtype 2C19 

10 sequence within intron 4. The amplification product from 
these primers has a length of 329 bp. 

Having amplified a segment of a 2C19 gene known to 
span a polymorphism, a variety of assays are available for 
determining whether a mutation is present in an individual 

15 under test. A generally applicable, but relatively laborious 
assay, is to sequence the amplified fragment across the 
polymorphic locus and compare the resulting sequence with the 
wildtype 2C19 sequence shown in Fig. 15-17. 

A simpler assay, but one applicable to only certain 

20 mutations, is to compare the size or restriction profile of 
the amplified segment, optionally in comparison with a 
corresponding wildtype 2C19 segment. For the 681 
polymorphism, restriction analysis provides a rapid and clear- 
cut means of identifying a mutant allele. The 681 

25 polymorphism results in loss of a Smal site and acquisition of 
an EcoRII site in mutant alleles. Thus, Smal digestion of a 
wildtype allele produces an extra band compared with a mutant 
allele. For the amplification product obtained using the 
exemplified primers discussed above, Smal digestion of the 

30 wildtype product yields fragments of 120 and 49 bp, whereas 
the mutant amplification product remains uncut yielding a 
single fragment of 169 bp. In individuals homozygous for the 
wildtype allele, only the 120 bp and 49 bp bands are present. 
In individuals homozygous for the mutant allele, only the 169 

35 bp band is present. In heterozygotes , all three bands (i.e., 
169, 120 and 49 bp) are present. The bands can usually be 
detected by agarose or acrylamide gel electrophoresis and 
ethidium bromide staining. If greater sensitivity is needed, 
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the amplification product is labelled and the bands detected 
t>y/ e.gr.; autoradiography. Of course, the assay can also be 
performed using an isoschizomer of Smal with identical 
results. The assay can also be performed by digesting with 
5 EcoRII or an isoschizomer thereof. In this case, one obtains 
a mirror image of the results obtained for Smal digestion, 
because the mutant 2C19 allele contains an additional EcoRII 
site relative to the wildtype allele. As a quality control 
measure, both Smal and EcoRII digestions can be performed on 

10 separate aliquots of a test sample. Of coiirse, any other 
enzyme that recognizes a site that includes the 681 
polymorphism can also be used. For example, alternatives to 
Smal (i.e., that cleave only the wildtype allele) include 
Aval, Mspl, Neil, ScrFI and TspEI) . 

15 The 636 polymorphism can be similarly analyzed by 

digestion with e.g., BamHI. BamHI digestion of a wildtype 
allele produces an extra band compared with a mutant allele. 
For the amplification product obtained using the exemplified 
primers discussed above, BzunHI digestion of the wildtype 

2 0 product yields fragments of 23 3 and 9 6 bp, and digestion of 
the mutant product yields a single fragment of 329 bp. In 
individuals homozygous for the wildtype allele, only the 233 
bp zmd 96 bp bands are present. In individuals homcii!zygous for 
the mutant allele, only the 329 band is present. In 

25 heterozygotes, all three bands are present. Of course, other 
enzymes that cut the wildtype allele at the polymorphic locus 
but not the 636 mutant allele, or vice versa, can also be 
used. For example, alternatives to BamHI include Alwl, BsaJI, 
BstVI, Dpnl, EcoRII, NlalV, Sau3AI and ScrFI. Enzymes that 

30 recognize a site on the mutant allele including nucleotide 

636, but do not recognize the wildtype allele, include Hinfl 
and Tfil. 

For simultaneous detection of the 681 and 636 
polymorphisms after amplification of a fragment spanning both 
35 polymorphism, the DNA can be double digested with two of the 
enzymes mentioned above. One enzyme should distinguish 
between the mutant 681 allele from a wildtype allele and the 
other should distinguish the mutant 63 6 allele from a wildtype 
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allele. For example, double digestion with Smal and BamHI is 
suitable. The double digestion generates six different 
restriction patterns corresponding to the six possible 
genotypes: wt/wt, wt/681, wt/636, 681/681, 636/636 and 
5 681/636. See Figure 19. 

In another assay, amplification products are 
subjected to single-stranded conformational analysis. See, 
e.gr., Hayashi, PCR Methods & Applications 1, 34-38 (1991); 
Orita, Proc. Natl. Acad. Sci. USA 86, 2766-2270 (1989); Orita 

10 et al.. Genomics 5, 874-879 (1989). This method is capable of 
detecting many single base mutations in DNA fragments up to 
200 bp irrespective whether the mutation causes a change in 
restriction fragment profile. In this method, the PCR 
reaction is performed using at least one labelled nucleotide 

15 or labelled primer to obtain a labelled amplified fragment. 
The amplification product is then denatured and the strands 
resolved by polyacrylamide gel electrophoresis under 
nondenaturing conditions. Mutations are detected by altered 
mobility of separated single strands. 

20 

b. Selective Amplification of an Allelic 

Variant 

I An alternative method for detecting defined 
mutations in a 2C19 gene employs a selective strategy whereby 

25 a wildtype allele is amplified without amplification of a 
mutant allele (or vice versa) . This is accomplished by 
designing one of the primers to hybridize to a subsequence 
overlapping a defined polymorphism (for example, the 681 
polymorphism) . Such a primer can be designed to hybridize to 

3 0 one polymorphic allele without hybridizing to the other. 
Thus, when such a primer is paired with a second primer 
hybridizing distal to the polymorphic region, amplification 
will only occur for one polymorphic allele. 

For diagnosis of the 681 polymorphism, selective 

35 amplification of the wildtype allele of 2C19 can be 

accomplished using a forward primer that has about 10-50, and 
usually 15-30 nucleotides from the wildtype 2C19 sequence 
shown in Fig. 15 or 16, including nucleotide 681. Such a 
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forward primer when paired with any suitable reverse primer 
downstream from nucleotide 681 (i.e., sufficiently 
complementary to the sense strand of 2C19 to hybridize 
therewith) can be used to amplify selectively the wildtype 
5 allele without amplifying a mutant allele. The selectivity 
between amplification of wildtype and mutant alleles is 
greatest when the 681 nucleotide occurs near, or preferably, 
at the 3' end of the primer- Because the extension forms from 
the 3' end of the primer, a mismatch at or near this position 

10 is most inhibitoiry of amplification. The same result can be 
achieved by using a reverse primer that has about 10-50 or 
usually 15-3 0 contiguous nucleotides from the complement of 
the wildtype 2C19 sequence shown in Fig. 15 or 16 (i.e., the 
antisense strand) including the nucleotide at position 681. 

15 Such a reverse primer can be paired with any suitable forward 
primer sufficiently complementary to a subsequence of the 
antisense strand of the 2C19 gene upstream from nucleotide 681 
to hybridize therewith. The 681 nucleotide should again be at 
or near the 3' end of the reverse primer. 

20 Selective amplification of a 681 mutant allele is 

accomplished by an analogous strategy in which primers are 
designed to hybridize to the mutant allele without hybridizing 
to the wildtype. A suitable forward primer for amplification 
comprises about 10-50 or usually 15-30 contiguous nucleotides 

25 from the mutant 2C19 sequence shown in Fig. 15 of 16 (i.e., 

the sense strand) . The forward primer can be paired with any 
suitable reverse primer sufficiently complementary to the 
sense strand of a downstream 2C19 subsequence to hybridize 
therewith. Alternatively, the same result can be achieved 

30 using a reverse primer comprising about 10-50 or 15-30 

contiguous nucleotides from the complement of the mutant 2C19 
sequence shown in Fig. 15 or 16 (i.e., the antisense strand). 
Such a reverse primer can be paired with any suitable forward 
primer sufficiently complementary to the antisense strand of 

35 an upstream^^C19 subsequence to hybridize therewith. 

For diagnosis of the 636 polymorphism, selective 
amplification of the wildtype allele of the 2C19 allele can be 
accomplished using a forward primer that has about 10-50, and 
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usually 15-30 nucleotides from the wildtype 2C19 genomic 
sequence shown in Fig. 17, including nucleotide 636. Such a 
forward primer when paired with any suitable reverse primer 
downstream from nucleotide 636 {i.e., sufficiently 
5 complementary to the sense strand of 2C19 to hybridize 

therewith) can be used to amplify selectively the wildtype 
allele without amplifying a mutant allele. The 636 nucleotide 
usually occurs near, or preferably, at the 3* end of the 
primer. The same result can be achieved by using a reverse 

10 primer that has about 10-50 or usually 15-30 contiguous 

nucleotides from the complement of the wildtype 2C19 genomic 
sequence shown in Fig. 17 (i.e., the antisense strand) 
including the nucleotide at position 636. Such a reverse 
primer can be paired with any suitable forward primer 

15 sufficiently complementary to a sequence of the antisense 
strand of the 2C19 gene upstream from nucleotide 636 to 
hybridize therewith. The 636 nucleotide should again be at or 
near the 3' end of the reverse primer. 

For selective amplification of a 636 mutant allele a 

20 suitable forward primer for amplification comprises about 10- 
50 or usually 15-3 0 contiguous nucleotides including 
nucleotide 636 from the mutant 2C19 genomic sequence shown in 
Fig. 17 (i.e., the sense strand] . The forward primer can be 
paired with any suitable reverse primer sufficiently 

25 complementary to the sense strand of a 2C19 genomic 

sub6ec[uence downstream from nucleotide 636 to hybridize 
therewith. Alternatively, the same result can be achieved 
using a reverse primer comprising about 10-50 or 15-30 
contiguous nucleotides including nucleotide 636 from the 

30 complement of the mutant 2C19 sequence shown in Fig. 17 (i.e., 
the antisense strand) . Such a reverse primer can be paired 
with any suitable forward primer sufficiently complementary to 
the antisense strand of a 2C19 subsequence upstream from 
nucleotide 636 to hybridize therewith. 

35 Following amplification, the sample under test is 

characterized as wildtype or mutant by the presence or absence 
of an amplification product. With a primer designed for 
selective amplification of the wildtype allele, the presence 
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of an amplification product is indicative of that allele and 
the absence of an amplification product indicative of a mutant 
allele. The converse applies for primers designed for 
selective amplification of a mutant allele. In preferred 
assay, a sample is divided into two aliquots, one of which is 
amplified using primers for wildtype allele amplification, the 
other of which is amplified using primers appropriate for 
mutant allele amplification. The presence of an amplification 
product in one but not both of the aliquots indicates that the 
individual under test is either wildtype or a homozygous for 
the mutation (depending on aliquot in which the amplification 
product occurred) . The presence of amplification product in 
both aliquots indicates that the individual is heterozygous. 
The absence of an amplification product in both aliquots would 
indicate either the absence of a 2C19 gene or a quality 
control problem in the amplification procedure requiring that 
the assay be repeated. Coamplif ication of a second known 
standard hiiman gene using a second set of primers can aid in 
distinguishing between these possibilities. If both bands are 
missing, the problem is probably quality control, while 
amplification of only the standard gene is suggestive that the 
CYP2C19 gene may be deleted. 

The presence or absence of amplification products 
can be detected by gel electrophoresis. Gels are usually 
visualized by ethidium bromide staining. However, if greater 
sensitivity is required fragments can be labelled in the 
course of amplification. Amplified fragments can be 
electrophoresed directly or can be cut with any restriction 
enzyme that releases fragments of a convenient size from the 
amplification products. For the simultaneous analysis of 
multiple samples, the dot-blot method may be advantageous. In 
the dot blot method, multiple unlabelled amplification 
mixtures are bound to discrete locations on a solid support, 
such as a membrane. The membrane is incubated with labeled 
probe under suitable hybridization conditions, the 
unhybridized probe removed by washing, and the filter 
monitored for the presence of bound probe. 
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c. Southern Blotting 

For polymorphic mutations resulting in loss or 
acquisition of a restriction site (such as the 681 and 636 
polymorphisms) , samples of genomic DNA can also be analyzed by 
5 Southern blotting without the need for prior amplification. 

The DNA is digested with an enzyme that cuts a wildtype allele 
but not a mutant allele or vice versa (e.g., BamHI, Smal, 
EcoRII or Hinfl, or isoschizomers of any of these). For 
analysis of the 681 polymorphism, digestion with Smal or 

10 isoschizomers results in an additional fragment from the 

wildtype allele compared with the mutant allele. Digestion 
with EcoRII or isoschizomers results in an additional fragment 
from the mutant allele. Digestion products are detected with 
a 2C19 probe. For analysis of the 63 6 polymorphism, digestion 

15 with BamHI or isoschizomers results in an additional fragment 
from the wildtype allele compared with the mutant allele. 
Digestion with Hinfl results in an additional fragment from 
the mutant allele. The probe can be any segment of a 2C19 DNA 
sequence that includes the polymorphism and extends for at 

20 least about 20 nucleotides on either side. 



2. cDNA Assays 

Defined polymorphisms can also be detected by j 
analysis of cDNA by similar strategies to those employed for 
25 genomic DNA. However, the primers appropriate for 

amplification procedures are not necessarily interchangeable 
for the two substrates. Suitable primers for analysis of the 
681 and 636 polymorphisms in cDNA are described below. 

30 a. Amplification of Segments Spanning a 

Defined Mutation 

The 681 polymorphism in genomic DNA results in 
a 40 bp deletion of cDNA comprising nucleotides 643-682 of the 
wildtype 2C19 cDNA or genomic sequence shown in Fig. 12. The 

35 forward primer and reverse primers are therefore designed to 
hybridize to 2C19 subsequences on opposite sides of this 
deletion. Thus, for example, a forward primer can hybridize 
to the antisense strand of a 2C19 secpience upstream from 
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nucleotide 64 3 of the coding region. Such a forward primer 
should be paired with a reverse primer that hybridizes to the 
sense strand of the 2C19 sequence downstream from nucleotide 
682. Nucleotides in a 2C19 DNA sequence are designated the 
5 numbers of corresponding nucleotides in the wildtype cDNA 

sequence shown in SEQ- ID. No. 2 (or Fig. 12, which shows a 
subsequence of SEQ. ID, No, 2) , when the sequences are 
maximally aligned. Preferably, the forward primer comprises 
about 10-50 or 15-30 contiguous nucleotides upstream of 

10 nucleotide 645 from the wildtype 2C19 cDNA sequence shown in 
Fig. 12 or SEQ. ID. No. 2. Analogously, the reverse primer 
preferably comprises about 10-50 or 15-30 contiguous 
nucleotides from the complement of the wildtype 2C19 cDNA 
sequence shown in Fig. 12 or SEQ. ID. No. 2 downstream from 

15 nucleotide 682 of the coding region. For example, a forward 
primer comprising 5 ' -ATTGAATGAAAACATCAGGATTG-3 ' (SEQ. ID. 
No. 59) and a reverse primer comprising 5*- 

GTAAGTCAGCTGCAGTGATTA-3 ' (SEQ. ID. No. 60) form a suitable 
pair. The amplification product from such primers is 40 bp . 

20 longer for the wildtype 2C19 cDNA sequence than for the 681 
mutant sequence. 

For detection of the 636 polymorphism, the forward 
prijper and reverse primers are designed to hybridize to 2C19 
subsequences on opposite sides of nucleotide 636. Thus, for 

25 example, a forward primer can hybridize to the antisense 

strand of a 2C19 sequence upstream from nucleotide 636 of the 
coding region. Such a forward primer should be paired with a 
reverse primer that hybridizes to the sense strand of the 2C19 
sequence downstream from nucleotide 636 (SEQ, ID. No. 2 or ,^ 

30 Fig, 12) . Preferably, the forward primer comprises about 10- 
50 or 15-3 0 contiguous nucleotides upstream of nucleotide 636 
from the wildtype 2C19 cDNA sequence shown in Fig. 12 or SEQ. 
ID. No. 2, Analogously, the reverse primer preferably 
comprises about 10-50 or 15-30 contiguous nucleotides from the 

35 complement of the wildtype 2C19 cDNA sequence shown in Fig. 12 
or SEQ. ID. No. 2 downstream from nucleotide 636 of the coding 
region. 
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For simultaneous detection of the 636 and 681 
polymorphisms, the forward primer should be as described for 
detection of the 636 polymorphism and the reverse primer as 
described for detection of the 681 polymorphism. These 
5 primers will amplify a segment of DNA spanning both the 636 
and 681 polymorphisms - 

Amplification products are usually analyzed by gel 
electrophoresis. The products can be analyzed uncut or can be 
cleaved with any restriction enzyme having a site in the 

10 amplification product. For detection of the 681 polymorphism, 
Smal and its isoschizomers are particularly useful because of 
the presence of a restriction site present in wildtype 2C19 
DNA that is not present in the mutant form. See Fig. 12. 
Similarly, BamHI and its isoschizomers are particularly useful 

15 for detection of the 636 polymorphism. Analysis of fragments 
allows distinction between wildtype, homozygous and 
heterozygous mutations as discussed for the corresponding 
genomic assay. 

20 b- Selective Amplificatio n of an Allelic 

Variant 

For analysis of the 681 polymorphism, selective 
amplification of the wildtype variant is achieved by selecting 
a forward or reverse primer that overlaps nucleotides 643-682 

25 of the wildtype 2C19 cDNA sequence (Fig. 12) . This segment of 
nucleotides is not present in a mutant allele. Thus, a primer 
hybridizing to this segment of the wildtype allele will not 
hybridize to the mutant allele. Accordingly, such primers can 
be used to prime amplification of the wildtype allele without 

30 priming amplification of the mutant allele. For example, a 
forward primer that hybridizes to the complement of the 
wildtype 2C19 cDNA sequence shown in Fig. 12 between^, 
nucleotides 643-682 without hybridizing to the complement of 
the mutant 2C19 DNA sequence shown in Fig. 12 is suitable. 

35 Such a forward primer can be paired with any suitable reverse 
primer sufficiently complementary with a downstream 
subsequence of the sense strand of the 2C19 cDNA to hybridize 
therewith . 
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Alternatively, a reverse primer is designed that 
hybridizes to the wildtype 2C19 cDNA sequence shown in Fig. 12 
between nucleotides 643 and 682 without hybridizing to the ^ 
mutant 2C19 cDNA sequence shown in Fig. 12. Such a reverse ^ 
primer can be paired with any suitable forward primer 
sufficiently complementary with an upstream subsequence of the 
antisense strand of the 2C19 cDNA to hybridize therewith. 

Primers for selective amplification of the mutant 
allele can also be designed. A suitable primer hybridizes to ^ 
two 2C19 subsequences, of about 1-50,- 5-30 or 10-20 
nucleotides, which subsequences are separated by nucleotides 
643-682 in the wildtype sequence, but which are contiguous in 
the mutant sequence. Such primers hybridize to mutant 2C19 
CDNA sequences without hybridizing to wildtype sequences. For 
example, a forward primer comprising a subsequence of 
nucleotides 633-642 of the wildtype 2C19 cDNA sequence shown 
in Fig. 12 joined to a second subsequence of nucleotides 684- 
693 of this sequence is suitable. This primer can be paired 
with any suitable reverse primer sufficiently complementary to 
a downstream subsequence of the sense strand of the 2C19 cDNA 
to hybridize therewith. 

For analysis of the 636 polymorphism, primers can 
designed using the sa4e strategy as discussed for selective 
amplification of genomic DNA except that the primers, which 
include nucleotide 636, are formed from nucleotide segments 
from cDNA rather than genomic sequences. 

Amplification products are analyzed using the same 
methods as described for corresponding genomic amplification 
products . 

V. Diagnost ic Kits 

The invention also provides kits comprising useful 
components for practicing the diagnostic methods of the 
invention. The kits comprise at least one of the primers 
discussed above. Kits usually contain a matched pair of 
forward and reverse primers as described above for amplifying 
a segment encompassing the 681 and/or the 636 polymorphism. 
Some kits contain two matched pairs of primers, e.g., one pair 
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for analysis of the 681 polymorphism, the other pair for 
analysis of the 636 polymorphism. For selective amplification 
of mutant or wildtype alleles, kits usually contain a pair of 
primers for amplification of the mutant allele and/or a 
5 separate pair of primers for amplification of the wildtype 
allele. Optional additional components of the kit include, 
for example, restriction enzymes for analysis of amplification 
products, such as BamHI, Smal, Hinfl and/ or EcoRII (or 
isoschizomers of any of these) , reverse-transcriptase or 

10 polymerase, the substrate nucleoside triphosphates, means used 
to label (for example, an avidin-enzyme conjugate and enzyme 
substrate and chromogen if the label is biotin) , and the 
appropriate buffers for reverse transcription, PGR, or 
hybridization reactions. Usually, the kit also contains 

15 instructions for carrying out the methods. 

G. Nucleic Acid Fragments 

In another aspect, the invention provides fragments 
of a mutant 2C19 allele spanning the 681 polymorphism and/or 
20 636 polymorphism. The fragments usually have up to about 50, 
100, 200, 500, 1000, 2000 or 10,000 bp of 2C19 sequence. Some 

fragments comprise at least about ten contiguous nucleotides 

.1, 

including nucleotide 681 from the mutant 2C19 allele shown in 
Fig. 15. Other fragments comprise at least about ten 

25 contiguous nucleotides including nucleotide 636 from the 

mutant 2C19 allele shown in Fig. 17. The fragments can be 
single or double stremded. The fragments are provided in 
substantially purified form- Usually, the fragments are the 
result of PCR amplification. The fragments are useful in the 

30 diagnostic assays discussed above. 

The following examples are provided to illustrate 
but not to limit the invention. 



35 EXAMPLES 

Materials . Human liver samples were obtained from 

organ donors through the National Disease Research Interchange 

in Philadelphia, PA, and from the Human Liver Research 
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Facility, Stanford Research Institute, Life Sciences Division, 

Menlo Park, CA. Restriction endonucleases were purchased from 

33 

Pharmacia LKB Biotechnology, Inc. (Piscataway, NJ) . [a- ] 
dCTP (3000 Ci/mmol) and [t-^^P] ATP (500 Ci/mmol) and [a-^^S] 
dATP (650 Ci/mmol) were from Amersham Corp. (Arlington 
Heights, IL) . All other reagents were of the highest quality 
available, 

Conditions . Hybridization and washing conditions 
for screening libraries with random-labeled cDNAs for 2013 (g) 
or 254c used the same solutions as described for actin, but ■ 
were performed at nonstringent temperatures (42 "C). 
Conditions for hybridization of clones with T300R were 
identical with those described above. Hybridization of cDNA 
clones with M300R (recognizes 2C9, 2C10, and 2C19) (5'- 
ACTTTTCAATGTAAGCAAAT-3 • ) (SEQ, ID. No. 17) was identical 
except that for each oligomer the hybridization temperat\ire 
and the high-stringency wash were 5°C below the calculated 
melting temperatures. 

Example i : Construction and Screening of Human Liver cDNA 
Libraries 

Two cDNA libraries were constructed from human 
livers 860624 and S33, which differed phenbtypically in the 
hepatic content of P450 HLx (2C8) (SEQ. ID. No. 8). Several 
partial cDNA clones were found but no full-length clones. 

A second cDNA library (from a liver phenotypically 
high in HLx) was then screened. Eighty-three essentially 
full-length (>1.8 kb) clones belonging to the 2C subfamily 
were isolated from this library. These include full-length 
clones for two additional new members of the 2C subfamily. 

The majority of the cDNAs characterized in the high- 
HLx library (60%) were one of two allelic variants of 2C9, 
while 35% represented 2C8 (SEQ. ID. No. 8). Two new genes 
were identified (two allelic variants of 2C18 and 2C19) . 

The two cDNA libraries from individuals 
phenotypically high and low in HLx were examined to determine 
whether a variant mRNA for 2C8 (SEQ. ID. No. 8). was 
responsible for the polymorphic expression of HLx and to 



wo 95/30766 



f 



PCTAJS95/05744 



55 

identify additional members of the 2C subfamily. No clones 
for 2C8 (SEQ, ID, No, 8) were isolated from the individual 
phenotypically high individual. Two allelic variants for 2C9 
were isolated. In addition, full-length cDNAs for two 
5 additional new members (2C18 and 2C19) were isolated. These 
new members of the 2C subfamily were expressed in COS-1 cells 
and shown to be immunochemically distinct from HLx and 2C9, 
and 2C18 metabolized racemic mephenytoin. 

Total human liver UNA was prepared by the guanidine 

10 hydrochloride method (Cox, Methods Enzymol. 12:120-129 (1968)) 
from two human livers either low (860624) or high (S33) in HLx 
as identified by immunoblot analysis. Poly(A+)RNA was then 
isolated by two passages over an oligo(dT) -cellulose column 
(Aviv et al., Proc, Natl. Acad. Sci. U.S.A. 69:1408-1412 

15 (1972)). The low-HLx cDNA library was prepared by Stratagene 
Cloning systems (La Jolla, CA) , and the double-stranded cDNA 
was treated with SI nuclease. Following the addition of EcoRI 
linkers, the double-stranded cDNA was size-fractionated on a 
CL-4B Sepharose column. The largest fraction was ligated into 

20 XZAPII and then transfected into XLl-Blue. The high-HLx cDNA 
library was constructed following the methods of Watson et 
al., in DNA Cloning (Glover, D.M. , Ed.) 1:79-88, IRL Press, 
Washington, D.C. (1985)). Double-stranded cDNA was lighted to 
EcoRI linkers, size-fractionated on an agarose gel (1.8-2.4 

25 kb) , and then ligated into XZAPII (Stratagene) and transfected 
into XLl-Blue. 

The low-HLx library was screened under conditions of 
low stringency with a ^^P-labeled rat P450 2C13 cDNA probe and 
with oligonucleotides for h\aman 2C8 (SEQ. ID. No. 8) (T300R) 

30 (5'-TTAGTAATTCTTTGAGATAT-3 ' ) (SEQ. ID. No. 18) and 2C9 (M300R) 
(5'-CTGTTAGCTCTTTCAGCCAG-3 • ) (SEQ. ID. No. 19). Thehigh-HLx 
library was screened under conditions of low stringency using 
a -^^P-labeled 254C cDNA probe derived from the first library 
and M300R (2C9) . Positive clones were isolated, transfected 

35 into XLl-Blue, and excised into the plasmid Bluescript, 
according to Stratagene 's excision protocol. 

Screening the cDNA library constructed from a low- 
HLx individual with a cDNA for rat 2C13 under nonstringent 



wo 95/30766 



PCTAJS95/05744 



56 

conditions and with oligonucleotide probes specific for 2C8 
(SEQ. ID. No. 8) and 2C9 yielded several clones for 2C9 and a 
partial DNA, clone 254c, which now appears to be an 
incompletely characterized splice variant of the P450 2C 
subfamily. None of the clones identified in this library were 
full-length. Clone 186 was identical with but 25 base pairs 
longer than MP-4, a 2C9 clone previously described by Ged et 
al. (1988). 

Approximately 4 0000 plaques were then screened from 
the library from liver S3 3 with the cDNA for 254c under non- 
stringent conditions and with an oligonucleotide probe 
specific for 2C9. Eighty-three essentially full-length 2C 
clones (>l.8 Kb) were isolated, purified, and partially or 
completely sequenced (Table I) , Of these, 29 clones were 
found to encode cytochrome P450 2C8* (SEQ. ID. No. 8) . One 
clone (7b) of 2C8 (SEQ. ID. No. 8) was isolated which was 
similar to Hpl-1 and Hpl-2 reported by Okino et al. (1987) , but 
different by having a tyrosine at position 130 instead of an 
asparagine and an isoleucine at 264 instead of a methionine. 
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TABLE I 

Distribution of P450 2C cDNA clones from 
Human Liver S3 3* 

No. of Clones % Distribution 

5 

2C8 (SEQ. ID. No-8) 29 35 

2C9 

65 (SEQ. ID. No. 10) 39 47 

25 (SEQ. ID. No. 4) 11 13 

10 2C10 0 0 
2C18 

29C (SEQ. ID. No. 6) 1 1.2 

6b (SEQ ID. No. 12) 2 2.5 

2C19 (11A)(SEQ ID No. 2) 1 1.2 

15 Total 83 100 



* Clones were classified by hybridization with specific 
oligonucleotide probes and partial sequencing. 

20 There are a number of polymorphisms in the human 

CYP2C subfamily. These include variations in the hepatic 
levels of HLx (Wrighton et al.. Arch. Biochem. Biophys. 
306:240-245 (1987)) and metabolic variations in the hepatic 
metabolism of S-mephenytoin. The molecular basis for these 

25 polymorphisms has not been characterized. 2C8 (SEQ. ID. No. 
8) appears to encode the protein for HLx on the basis of its 
N-terminal amino acid sequence (Okino et al., J. Biol. Chem, 
262:16072-16079 (1987); Wrighton et al., supra; Lasker et al. , 
Biochem. Biophys. Res . Coimun . 148:232-238 (1987)). 

30 

Example 2: Sequence Analysis 

The Bluescript plasmids containing the positive cDNA 
inserts from the low-HLx library were purified by CsCl 
gradients, while the plasmids containing cDNA inserts from the 
35 high-HLx library were purified by using Qiagen plasmid 
purification kits (Qiagen, Inc., Studio city, CA) . The 
double-stranded cDNA inserts were sequenced by the dideoxy 
chain termination method reported in Sanger et al,, J. Mol. 
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Biol. 162:729-773 (1982), using Sequenase kits (U.S. 
Biochemical Corp., Cleveland; OH). The full-length clones 65 
(SEQ. ID. No. 10), 25 (SEQ. ID. No. 4), 7b, 11a (SEQ. ID. 
No. 2), 29c (SEQ. ID. No. 6) and 6b (SEQ. ID. No. 12) were 
5 sequenced completely in both directions with primers spaced 
approximately 20 bases apart. The remaining positive clones 
from the high-HLx cDNA library were sequenced in both 
directions through both the 5' and 3' ends and through all the 
regions which would identify any of the known allelic 
10 variants. 

The majority of the clones (50) isolated from the 
library from liver S33 coded for 2C9. Interestingly, all of 
the 50 clones appeared to be 1 of 2 2C9 allelic variants, 
typified by the full-length clones 65 (SEQ. ID. No. 10) and 25 

15 (SEQ. ID. No. 4) . All of these clones were sequenced through 
the 5 ' and 3 ' ends and through regions which would identify 
known allelic variants. Thirty-nine of the 2C9 clones were 
identical with clone 65 (SEQ. ID. No. 10), and 11 were 
identical with clone 25 (SEQ. ID. No. 10). 

2 0 The nucleotide sequence for clone 65 (SEQ. ID. No. 

10) and clone 25 (SEQ. ID. No. 4) is shown in Figure 2. 
Clones 25 (SEQ- ID. No. 4) and 65 (SEQ- ID. No. 10) were 
identical in the 5 ' - ank 3 ' -noncoding regions but contained 
two single-base changes at positions 1075 and 1425. One of 

25 these base changes was conservative, but the second would 
result in one amino acid difference at position 359 
(isoleucine versus leucine) . clone 65 (SEQ. ID. N. 9) is 
identical in amino acid sequence with human form 2, although 
it differs by two silent changes in the coding region and four 

30 differences in the noncoding region (Yasumori et al., 1987), 
Clone 65 (SEQ. ID. No. 9) contained a leucine instead of a 
isoleucine at position 4, a valine instead of a serine at 
position 6, and an arginine instead of a cysteine at position 
144 compared to the 2C9 sequenced by Kimura et al- (1987). 

35 The 2C9 reported by Meehan et al. has substitutions at 

positions 144, 175, and 238 compared to the clones obtained in 
this invention (Meehan et al. , Am J Hum Genet., 42:26-37 
(1988) ) - 
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The remaining clones characterized from the human 
liver S33 cDNA library encode several novel P450 2C cDNAs. 
Their DNA sequences are shown in Figure 2 and their percent 
homology with other known 2C members shown in Table II. Two 
5 of these clones, 29c (SEQ. ID. No. 6) and 6b (SEQ. ID. No. 

12), differ by one nucleotide in the coding region (position 
1154), which would result in a single amino acid change 
(threonine vs methionine at position 385). Clone 29c (SEQ. 
ID. No. 6) had a very long (198 bp) 5'-noncoding sequence and 

10 a polyadenylation signal 21 bases from the poly (A) tail. 

Clone 6b (SEQ. ID. No. 12) had an unusually long 3'-noncoding 
region containing three possible polyadenylation signals with 
no poly (A) tail. The differences in the 3'-noncoding region 
could represent alternate splicing, allelic variants, or 

15 possibly separate genes. However, these clones are designated 
as allelic variants of (2C18) because they differ by only one 
base in the coding region. They are most similar to 2C9 (82% 
amino acid homology) and 2C19 (SEQ. ID. No. 2) (81% amino acid 
homology) (Table II) . 

20 A third unique P450 2C cDNA, clone 11a (SEQ. ID. 

No. 2) (designated 2C19) , was also identified. 2C19 is 92% 
homologous in its amino acid sequence to 2C9, 81% homologous 
to 2C18, and 79% homologous to 2C8 (SEQ. ID. No. 8). Clone 
11a (SEQ. ID. No. 2) had a short 5 '-leader sequence and 

25 contained the stop codon, but did not have a polyadenylation 
signal or poly (A) tail. Interestingly, no clones for 2C10 
(MP-8) were isolated from either library, despite the 
sequencing of the 3' region of all 50 putative 2C9 clones. 
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TABLE II 

Percent Homology for Nucleotide 
and Amino Acid Sequences of P450 2C cDNAs* 

5 

29c 11a 
Clone 2C8 2C9 (SEQ ID N0.6){SEQ ID NO. 2) 
(SEP ID NO. 8) f2C18) ^2C19) 



10 


29c (2C18) 


84 


86 


100 


86 




(SEQ ID NO. 6) 


89 


93 


100 


93 




11a (2C19) 


83 


94 


86 


100 




SEQ ID NO. 2) 


91 


96 


93 


100 



15 



* For each comparison, the upper value represents percent 
nucleotide homology, and the lower value represents 
percent amino acid homology. The nucleic acid 
comparisons include both the coding and 3 ' -non-coding 
2 0 regions. The 2C9 sequence used in this comparison was 

the cDNA sequence for clone 65. 

Figure 4 shows the alignment comparisons for the 
deduced amino acid sequences of all known members of the human 

25 CYP2C family, including the three new P450s of the present 

invention. The 7 proteins, along with the consensus sequence, 
can be aligned with no gaps, and each is predicted to be 490 
amino acids long. The amino acid setjuences show marked 
similarities with many regions of absolute conservation. 

30 Regions of marked conservation are noted florm 131 to 180, and 
from 302 to 460- These hxman P450 2C protein sequences also 
demonstrate hypervariable regions which may be important for 
interactions between the enzyme and substrate. These include 
the region from 181-120 and 220-248 as well as 283-296 and a 

35 short region near the carboxyl terminus at 461-479. Notably, 
it has been reported that a putative recognition site for 
phosphorylation of P450 by cAMP-dependent kinase for P450 2B1 
(Arg-Arg-Phe-Ser) at positions 124-127 was conserved in 2C8 
(SEQ. ID. No. 8), 2C9, and 11 (2C19) , suggesting that these 

40 cytochromes might be regulated by phosphorylation (Muller et 
al., FEBS Lett. 187:21-24 (1985). 

However, 2C18 did not contain a serine at this site. 
The overall percent homology for both nucleic acid and protein 
sequences is siunmarized in Table II, 



( 
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Two additional full-length allelic variants of 2C9 
have been isolated. One of these clones is identical with MP- 
4, but is full-length. It varies from the almost full-length 
human form 2 isolated by Yasumori et al., supra, by only two 
5 silent base changes in the coding region and by four changes 
in the noncoding region. The number of differences in the 
nucleic acid sequences of the presumed allelic variants 
isolated by different laboratories range from 4 to 17 and the 
amino acid changes vary from 0 to 4 , as illustrated in 

10 Figxire 3. Two of the amino acid differences occur within the 
first six N-terminal residues, the others occurring singly 
throughout the sequence. The effect of these changes on 
catalytic activity has not been systematically studied. In 
Relling et al., J. Pharmacol. Exp. Ther. 252:442-447 (1990), 

15 it was reported that when the cDNAs for 2C8 (SEQ. ID. No. 8) 
and 2C9 4 -hydroxy lated racemic mephenytoin but did not 
metabolize (S) -mephenytoin. However, the form of isolated 2C9 
(human form 2) which is described in Yasumori et al, (1990) , 
metabolized (S) -mephenytoin preferentially when expressed in 

20 yeast. These forms differed by only three amino acids. In 

contrast, Brian et al-, Biochemistry 28 :4993-4999 (1989) found 
that when a full-length MP-8 (constructed with the first 15 
nlicleotides predicted from the known aanino acid sequence of 
P450jj^p.jL) was expressed in yeast, it did not metabolize (S)- 

25 mephenytoin. This form would differ from human form 2 by only 
two amino acids. Thus, the role of 2C9 in (S) -mephenytoin 
metabolism remains controversial. 

Example 3: Human RNA Blot Analysis and Hvbridization 
30 Cor)dj,tji,Qn^ 

Poly(A+) RNA (10/ig) was electrophoresed in a 1% 
agarose gel under denaturing conditions and transferred to a 
Nytran filter (Micron Separation, Inc., Westboro, MA), and 
filters were then baked for 2 h at 80**C. The filters were 
35 prehybridized for 2 h, then hybridized overnight with a ^^P- 
labeled specific oligonucleotide probe for 2C8 (SEQ. ID. 
No. 8) (T300R) at 42 °C, washed 3x5 min at room temperatxire 
and 1x5 min at 42° C with 2 x SSC/0.1% SDS, and 
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radioautographed. Filters were then stripped with 5 mM Tris 
(pH 8.0), 0.2 mM EDTA, 0.05% sodium pyrophosphate, and 0.1 x 
Denhardt's for 2 h at 65" C and rehybridized with a random- 
primed actin cDNA (Oncor, Gaithersburg, MD) at 50° c using 6 x 
5 BSC, 4 X Denhardts, and 0.5% SDS. These filters were washed 1 
X 5 min at room temperature, 1 x 10 min at 48® C, and 4 x 15 
min at 48° C and radioautographed as before. The 2C8 mRNA 
band was quantitated by scanning with an LKB Ultrascan laser 
densitometer, and the values of the integrated peaks were 

10 divided by those of the actin peaks. 

Hybridization with T3 00R was negligible in mRNA from 
860624 compared to S33 and a number of other liver samples 
(Figure 5) . When corrected for hybridization with the actin 
probe, the amounts of 2C8 (SEQ. ID. No. 8) mRNA were 

15 consistent with the relative amounts of HLx observed in 

Western blot analysis. Laser scans of the autoradiographs 
indicated that 2C8 (SEQ. ID. No. 8) mRNA levels in sample 
860624 were at least 70-fold lower than in S3 3 and 3 to 15- 
fold lower than in any of the remaining samples. 

20 

Example 4: Cell Expression Studies 

cDNA inserts were ligated into the cloning region of 

I 

the expression plasmids pSVL (Pharmacia LKB biotechnology, ' 
Inc., Piscataway, NJ) or pcD (Okayama et al., Mol. Cell. Biol. 

25 3:280-289 (1983)) and used to transform COS-1 cells. COS-1 

cells were placed at (1-2) x 10^ cells per 1-cm dish and grown 
for 24 h in Dulbecco's-modif ied Eagle's medium with 10% fetal 
bovine serum (DMEM) . The cells were then washed with 
Dulbecco's phosphate-buffered saline (PBS) and transfected 

30 with recombinant plasmid (3 fig per dish) in DEAE-dextran (500 
/zg/mL) for 30 min-1 h at 37° C. The transfected cells were 
then treated with chloroquine (52 ^g/mL) in DMEM for 5 h 
(Luthman et al., Nucleic Acids Res. 11:1295-1308 (1983)), 
washed with PBS, refed with DMEM, and incubated for 72 h prior 

35 to harvest. Typically, 15-20 dishes were transfected with 
each recombinant plasmid. For Western blot analysis of the 
recombinant transformed COS-1 cells, cells were scraped from 
the dishes into buffer (50 mM Tris-HCl, pH 7.5, 150mM KCl, and 
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ImM EDTA) and lysed with 3 x 5 s bursts with a polytron. A 
portion of each lysate was centrifuged at 9000g and then 
lOOOOg for the preparation of a microsomal fraction. Western 
blots were then performed as described above. Total RNA was 
isolated from transfected COS-1 cells, and Northern blots were 
performed as described for human samples. The filters were 
hybridized with a ^^P-labeled oligonucleotide probe which 
hybridizes with all 2C clones isolated (2C500R) (5»- 
GGAGCACAGCCCAGGATGAA-3 ') (SEQ, ID. No. 20) at 55°C, and 
radioautographed . 

The two variant cDNAs for 2C9, the two variant cDNAs 
for 2C18, and the cDNA for 2C19 were inserted into expression 
vectors and transfected into COS-l cells. Cell lysates were 
prepared and immunoblotted by using antibody to HLx and P450 
2C9. The results are shown in Figure 4. Transfection of COS- 
1 cells with the two variants of 2C9 (25 (SEQ. ID. No. 4) and 
65 (SEQ. ID. No. 10)) resulted in the expression of a protein 
(SEQ, ID. No. 3) with a molecular weight equal to that of pure 
2C9. In contrast, neither 2C18 (either variant) nor 2C19 was 
detected by antibody to HLx or 2C9. However, Northern blot 
analysis indicated that all three cDNAs had been successfully 
transfected into these cells. The sizes of the transcripts 
were those expectedll f or the constructs. The somewhat lesser 
hybridization of the 2C oligoprobe with RNA from cells 
transfected with 11a (SEQ. ID. No. 2) reflects a lower amount 
of RNA in this sample as shown by the hybridization with the 
actin probe. 

Example 5: Expre ssion of Cytochrome P45Q 2C19 and 2gia 
Polypeptides in a Stable Cell T.itip 
1. Materials 

fa) Liver Samples and Chemicals 
Human liver samples were obtained from Or, Fred 
Guengerich, University of Vanderbilt, Nashville, TN. 
Restriction endonucleases were purchased from Stratagene 
Cloning Systems (La Jolla, CA) . [of-^^p^^^^^p (3000 ci/mmol) , 
[T^^P]ATP (5000 ci/mmol) and [Qf-^^S]dATP (650 Ci/mmol) were 
from Amersham Corp. (Arlington Heights, IL) . Nirvanol was 
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obtained from Adrian Kiipfer, University of Berne, Switzerland 
and separated into its R- and S- enantiomers as described by 
Sobotka et al., J. Amer. Chem, Soc. 54:4697-4702 (1932). 
Radiolabelled S- and R-mephenytoin (N-methyl-^^C) were 
5 synthesized by E.I. DuPont de Nemours & Co., Inc. (Wilmington, 
DE) by methylation of R- and S-nirvanol. The radiochemical 
purity of both isomers was greater than 90% as assessed by 
HPLC. A single impurity which accounted for less than 2% of 
the parent compound was not characterized, since it eluted 
10 after the metabolites and parent compound. Moreover, the 
percentage of the impurity remained the same (less than 2%) 
before and after incubations. All sequencing was done by the 
dideoxymethod using Sequenase Kits (U.S. Biochemical Corp., 
Cleveland, OH). The specific activities of the S- and R- 
15 enantiomers were 20.7 and 20.9 mCi/mmol respectively. All 
other reagents used are listed below or were of the highest 
quality available. 

fb) Additional Secmences of 2C cDNAs Used in the 
Expression Studies 

Two full-length clones of 2C8 (7b and 7c) described 
in Romkes et al-. Biochemistry 30:3247-3255 (1991), were 
sequenced through the coding region in the present study. The 
sequences were similar to that of the 2C8(HP1-1) reported by 
Okino et al., supra; however, both clones had coding changes 
at position 390 (A-C) (Asn^^°-»Thr) and G-»C at position 792 
(Met^^^-»Ile) and a change in the noncoding region at 
1497 (T-*C). These changes presumably represent a second 
allelic variant of 2C8. The Thr^^^ and Ile^^^ amino acids 
foxind in our 2C8 clones are conserved in the remainder of the 
human P450 2C subfamily (2C9, 2C18, and 2C19) and are 
therefore consistent with the amino acid substitutions in 
other members of this subfamily. 

35 (c) Yeast Strains and Media 

5accharojnyces cerevisiae 334 (MAT a, pep 403, prbl- 
1122, ura 3-52, leu 2-3, 112, regl-501,gall) , a protease 
deficient strain kindly provided by Dr. Ed Perkins (NIEHS) , 



25 
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was used as the recipient strain in these studies and 
propagated non-selectively in YPD medium (1% yeast extract, 2% 
peptone, 2% dextrose) (Hovland et al., Gene 83:57-64 (1989)). 
For the selection of Leu* transf ormants, the cells were grovm 
in synthetic complete medium minus leucine (Rose et al.. 
Methods in Yeast Genetics (Rose et al., eds.) pp. 180-187, 
C.S.H.P., NY 1990). Plates were made by the addition of 2% 
agar. 



2. Methnrifi 

(a) Amplification of 7c.tr a n d 2C9 rwa for ni-r^m- 

Sequencing 

Total RNA from selected human liver samples was 
isolated by the single-step method (Chomozynski et al.. Anal. 
fliocJiem. 163:156-159 (1987), using TRIREAGENT" (Mol. Res. 
center. Inc., OH). RNA (lo ^g) „as reverse transcribed using 
2.6 /iM random hexamers as the 3 '-primer by incubating for 
1 hour at 42 00 using 2.5 U/^l of M-MLV reverse transcriptase 
(BRL, Grand Island, NY) in lO Tris-HCl, pH 8.3, 5 mM KCl, 
5mM Mgci2, i u/^l RNase inhibitor (Promega, Madison, WI) and 
1 mM each of dATP, dCTP, dGTP, and dTTP (Perkin Elmer Cetus, 
Norwalk, CT) . The samples were then heated for 5 minutes at 
99 °C to terminate the reverse transcription. 

The cDNA was then amplified for a region containing 
the allelic differences in 2C18 and 2C9 using a nested PGR 
method. The DNA was amplified in IX PGR buffer (50 mM KCl 
10 mM Tris-HCl, pH 8.3) containing i mM MgClj, 0.2 mM each 'of 
dATP, dCTP, dGTP, dTTP and 20 pmol of each of the 5- and 3 • 
primers in a final reaction volume of lOO nl. The reaction 
mixture was heated at 94 -c for 5 minutes before addition of 
2.5 U of AmpliTaq DNA polymerase (Perkin Elmer Cetus). For 
PGR of 2C18, the 3 '-primer was 5 • -TGGCCCTGATAAGGGAGAAT-3 ' 
(SEQ. ID. No. 23) and the 5 '-primers were 
5'-ATCCAGAGATACATTGACCTC-3'(SEQ. ID. No. 24) (outer) and 
5'-CCATGAAGTGACCTGTGATG-3' (SEQ. ID. NO. 25) (inner). For 
2C9, the 3 '-primer was 5 ' -aaagatggataaTGCCCCAG-3 ' (SEQ. ID. 
No. 26) and the 5 '-primers were 5 ' -gaaggagatccggcgtttct- 
3 '(SEQ. ID. No. 27) (outer) and 5 • -ggcgtttCTCCCTCATGACG- 
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3*(SEQ. ID. No, 28) (inner). The outer amplification was 
performed for 20 cycles consisting of denaturation at 94**C for 
1 minute, annealing at the appropriate temperature for 
3 0 seconds, and extension at 72<»C for 1 min. After a 50-fold 
dilution, PGR was carried out similarly with the inner primers 
for 35 additional cycles. 

The PCR products were purified using a Centricon-30, 
dried, suspended in 40 fil of sterile water, and sequenced 
using Sequenase Kits and a P^^-end labeled sequencing primer. 
For 2C18, the primer used was 2C18.1184'R 5 • -TTGTCATTGTGCAG-3 " 
(SEQ. ID. No. 29). Sequencing primers for 2C9 were 2C9.1030F 
5»-CACATGCCCTACACA-3' (SEQ. ID. NO. 30), 2C9.385F 
5'-TGACGCTGCGGAATT-3' (SEQ. ID. No. 31), and 2C9.783F 
5 '-GGACTTTATTGATTG-S (SEQ. ID. No. 32). 

Full length 2C9 cDNA was also amplified by PCR from 
a human liver with high S-mephenytoin 4 • -hydroxylase activity 
using the primers 5 • -ATGATTCTCTTGTGGTCCT-3 » (SEQ. ID. No. 33) 
and 5 ' -AAAGATGGATAATGCCCCCAG-3 • (SEQ. ID. No. 34) . The PCR 
reaction was similar to above, except that the primer 
concentrations were increased 10-fold (0.25 fM) . The PCR 
products were then cloned into the pCRlooo vector using the TA 
Cloning System (In Vitrogen, San Diego, CA) and sequenced to 
identify the allelic variant present, 

Ifel Plasmid Construci^ lon and Methods for Amplifying 
Full-length 2C18 and 2C19 cDNA s bv pcr 

The strategy for cloning the P450 2C cDNAs into the 
yeast vector pAAH5 is described below. The S'-noncoding 
sequence of the P450 2C cDNAs was eliminated by PCR 
amplification to optimize expression in yeast cells. The 5'- 
primer introduced a Hind III cloning site and a six A-residue 
consensus sequence upstream of the ATG codon to promote 
efficient translation in yeast (Hamilton et al., i^ucl. Acids 
Res. 15:3581-3593 (1987), Cullin et al.. Gene 65:203-217 
(1988)). The 3'- primer was positioned between the stop codon 
and poly adenylat ion site and introduced a second Hind III 
site. CDNA inserts in the pBluescript vector (O.l fig) (Romkes . 
et al., (1991), supra) were amplified by PCR as described 
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before except that the reaction contained 3.5 mM MgCl2, 
0.25 /iM each of the 5'- and 3'- primers, and 1 iil PerfectMatch 
(Stratagene, La Jolla, CA) . Amplification was performed in 
sequential cycles, with the first cycle including denaturation 
for 1 min. at 94 °C, annealing at the appropriate temperature 
for 1 min., and polymerization at 72 «C for 3 min. The 
remaining 24 cycles consisted of a denaturation step at 94^0 
for 1 min. and a combined annealing/extension step at 72 °C for 
3 min. After the last cycle, all samples were incubated an 
additional 10 min. at 72*'C. The primers used were: 
2C8 : 5 • -GCAAGCTTAAAAAAATGGAACCTTTTGTGGTCCT-3 ' (SEQ . ID . 
No. 35) and 5 ' -GCAAGCTTGCCAGATGGGCTAGCATTCT-3 • (SEQ. ID. 
No . 36); 2 C9 : 5 ' -GCAAGCTTAAAAAAATGGATTCTCTTGTGGTCCT-3 ' ( SEQ . 
ID . No . 37) and 5 • -GCAAGCTTGCCAGGCCATCTGCTCTTCT-3 ' ( SEQ , ID • 
No. 38); 2C19: 5 ' -GCAAGCTTAAAAAAATGGATTCTCTTGTGGTCCT-3 • (SEQ. 
ID. No. 39) and 5 ' -GCAAGCTTGCCAGACCATCTGTGCTTCT-3 • (SEQ. ID. 
No. 40) . 

The PGR products were cloned into the pCRloOO vector 
(InVitrogen, San Diego, CA) . Recombinant plasmids were 
isolated from E. coU (INVar-) cells using Qiagen plasmid 
purification kits, and the PCR products were completely 
sequenced as described above to verify the fidelity of the PCR 
reaction, A mutation of ASP^-Val was initially introducid 
inadvertently in 29c via the primers utilized due to an error 
in the original sequencing at this position. Therefore, the 
correct 2C18-Asp^ cDNAs were cloned into the pAAH5 vector by 
an alternate strategy. The 3 "-end was cut with Ndel, blunted, 
and ligated to a Smal/Hindlll adapter. The clone was then 
partially digested with BamHI which cuts after the initiation 
ATG as well as internally, and the intact 1700 fragment get 
purified. A BamHI/Hindlll linker was prepared from the oligos 
5'-AGCTTAAAAAAATG~3' (SEQ. ID, No. 41) (upper) and 
5'-GATCCATTTTTTTA-3« (SEQ. ID. No. 42) (lower), annealed, and 
ligated to the cDNA fragment to introduce a Hindlll cloning 
site and regenerate the ATG codon. 

The PCR amplified cDNAs were isolated by Hind III 
digestion, ligated into the pAAHS yeast expression vector, and 
the proper orientation confirmed by restriction analysis and 
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sequencing. The expression vector pAAHS, which contains the 
yeast ADHl promoter and terminator regions and the Leu2 
selectable marker, was kindly provided by Dr, M. Negishi 
(NIEHS) . The recombinant plasaids were isolated from E. coll 
DhSof cells using Qiagen plasmid purifications kits and 
transformed into yeast as described previously (Faletto et 
al., J. Siol. Chem. 267:2032-2037 (1992), using the lithium 
acetate method of Ito et al., J. Bacteriol. 153:163-168 
(1983) . 

l£l Immunoblots and Cytochrome P450 Determinations 
Yeast microsomes or whole cell lysates were prepared 
from transformed cells isolated at mid-logarithmic phase as 
described previously (Oeda et al., supra) with slight 
modifications (Faletto et al. , supra) and stored at -80"=»C in 
0.1 M phosphate (pH 7 ,4) containing 20% glycerol and 0.1 mM 
EDTA. Protein concentrations were determined by the method of 
Bradford et.al., Anal, Biochem. 72:248-254 (1976). SDS- 
polyacrylamide gel electrophoresis and Western blots were 
performed on yeast microsomes or whole cell lysates (Faletto 
et al., supra) and immunoblots probed with antibody to the 
appropriate P450 as described (Yeowell et al.. Arch. Biochem. 
Bio^hys. 243:408-419 (1985). Cytochromes P450 2C8, P450 2C9 
and NADPH:P450 reductase were purified from human liver 
microsomes (Raucy et al., Met±ods in Enzymol. 208:577-587 
(1991) and antibodies to 2C8 and 2C9 prepared in rabbits as 
previously described (Leo et al., Arch. Biochew. Biohys. 
269:305-312 (1988)). Specific peptides NH2-CIDYLPGSHNKIAENFA- 
COOH (SEQ. ID. No. 43) (amino acids 231-249) for P450 2C18 and 
NH2-CLAFMESDILEKVK-COOH (SEQ. ID. No. 44) (amino acids 236- 
249) for 2C19 were selected from amino regions where these 
P450S vary from other known 20 subfamily members (Romkes et 
al., (1991), supra). These peptides were synthesized, 
conjugated to bovine serim albumin via m-maleimidobenzoyl-N- 
hydroxysuccinimide ester, and antibodies to the conjugates 
raised in rabbits by BIOSYNTHESIS INC. (Denton, TX) . E. coli 
lysate (4 mg/ml) was added to the primary peptide antibody in 
first step of the immunoblot procedure to block non-specific 
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reactions of these rabbit antibodies to yeast cell wall 
proteins, cytochrome P450 concentrations of microsomes were 
determined by dithionite-reduced carbon monoxide difference 
spectra by the method of Omura et al., J, Biol. Chem. 
239:2370-2378 (1964) using an extinction coefficient of 91 mM- 
Icm"-^. 

Microsomes of human livers were prepared as 
described by Raucy et al., supra. SDS-polyacrylamide gel 
electrophoresis and immunoblot analysis was performed as above 
except that immunoblots were developed using the ECL (enhanced' 
chemiluminescence) Western blotting kit from Amersham (UK) . 
Immunoblots were scanned with a laser densitometer (LKB 
Instruments) . 

1^ Purification of Cytochromes from Recombinant 
Yeast Microsomes 

Recombinant yeast microsomes were prepared from a 
10-12 1 culture, and recombinant P450s were purified by 
aminooctylsepharose chromatography as described by Iwasaki et 
al., J. Biol. Chem. 226:3380-3382 (1991). The Emulgen was 
then removed from protein by adsorption of the protein to a 4g 
hydroxylapatite column (Hypatite C, Clarkson Chemical Company, 
Williamsport, PA) equilibrated with 10 mM potassium phosphate 
buffer (pH 7.2), 20% glycerol, 0,1 mM EDTA, and O.l mM DTT and 
washing the column with the same buffer until the absorbance 
at 280 nm returned to zero. The P450 was then eluted with 
4090 mM DTT, and dialyzed overnight against 100 mM potassium 
phosphate buffer (pH 7.4, 20% glycerol and 0.1 mM EDTA. 
Absolute and CO difference spectra of purified P450s were 
determined in the same buffer but containing 0.2% Emulgen and 
0.5% chelate. 

fe) Tolbutamide Hydroxylase Assays 
Tolbutamide hydroxylase activity was measured 
according to Knodell et al., j. Pharmacol. Exper. Ther. 
241:1112-1119 (1987), with several modifications. Yeast 
microsomes (1 mg protein) were preincubated with 3 00 pmol 
hamster P450 reductase in 0.2 ml of the incubation buffer 
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(below) for 3 min at 37 °C. The reaction was then placed on 
ice and incubated in 0.2 ml of 50 mM HEPES buffer (pH 7.4) 
containing 1.5 mM MgCl2, O.i mM EDTA in a final volume of i ml 
and 1 mM sodium tolbutamide. The reaction was initiated with 
0.5 mM NADPH. Human liver microsomes (0.22 mg protein) were 
incubated without reductase. Incubations with reconstituted 
recombinant P450s contained 50 pmol purified P450 enzyme, 
150 pmol P450 reductase, and 15 ng dilauroylphosphatidyl- 
choline, and were perfoinned in 100 mM potassium phosphate 
buffer (pH 7.4). Reactions were terminated after 60 min at 
37 »C by the addition of 50 fil of 4N HCl, followed by 
extraction with 3 ml of water-saturated ethyl acetate. The 
ethyl acetate extracts were dried under nitrogen at 40»C, the 
residue resolubilized in 200 fil methanol, and 
4-hydroxytolbutamide then assayed using HPLC by injecting 
50 111 of the solubilized extract onto a /xBONDAPAK C^g column 
(4.6x300 mm) using 0.05% phosphoric acid, pH 2.6: acetonitrile 
(6:4, v/v) as the mobile phase with a flow rate of i ml/min. 
The column eluate was monitored at 230 nm and rates of product 
formation were determined from standard curves prepared by 
adding varying amounts of 4-hydroxytolbutamide to incubations 
conducted without NADPH. Preliminary experiments confirmed 
that 4-hydroxytolbutamlLde formation by human liver microsomes 
(30-120 pmol P450) was linear for up to 90 min. Samples were 
analyzed in triplicate. 



ill Mephenvtoin 4 ' -Hy droxylase Assay 

Mephenytoin 4 • -hydroxylase activity was measured by 
a modification of the radiometric HPLC assay described by 
Shimada et al., J. Biol. Chem. 261:909-921 (1986), as 
described below. Purified or recombinant yeast microsomes 
(10-50 pmol) were preincubated with 

dilauroylphosphatidylcholine (15 /ig per 50 pmol P450) , P45o 
reductase (500 U per 50 mol P450) , and human cytochrome bg 
(2:1 molar ratio when added). The reconstitute^, mixture was 
preincubated for 5 min at 37 "C, and then placed on ice. a 
final concentration of 0,4 mM radiolabelled S- or R- 
mephenytoin (20.7 mCi/mM and 20.9 mCi/mMol) was added to 50 in> 
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HEPES buffer (pH 7.4) containing O.l mM EDTA and 1.5 nM MgCl 
for recombinant 2C proteins. The mixture was then incubated^ 
at 37 « with shaking for 3 min, and the reaction started with 
the addition of 2mM NADPH and terminated after 3 0 min with an 
equal volume of methanol. Cytochrome bg was not included in 
all CYP2C18 reactions, since it had no effect or produced a 
slight inhibition on the activity of this CYP protein. 
Reaction volumes were generally 0.25 ml except when the volume 
of recombinant purified cytochrome or yeast microsomes was 
greater than 50 ^1. m these cases, the volume was increased 
to 0,5 ml to limit the volume of glycerol from the purified 
preparation to <4% of the final volume. Incubations with 
human microsomes did not contain exogenous P450 reductase or 
cytochrome b^, and they were carried out in o.l m phosphate 
buffer (PH 7.4) instead of HEPES buffer. Initial experiments 
shows that S-mephenytoin hydroxylase activity of human liver 
microsomes was linear for at least 60 minutes and from 0.05 
through 0.2 mg microsomal protein, and that of the R- 
enantiomer was linear through 1 mg microsomal protein. 

At the end of the incubation period, the reactions 
were terminated with an equal volume of methanol. The 
incubation mixture was centrifuged at I0,000g for lo min and 
an aliquot assayed directly using HPLC without extraction 
samples with particularly low activity were concentrated by 
lyophilization and redissolved in a small volume of 
methanol: water (l:i) before assay. The HPLC system consisted 
of a reverse phase C18 (I0;,m) Versapak, 300 mm x 4.1 mm column 
(Altech Associates, Deerfield, IL) using an isocratic solvent 
consisting of methanol .-water (45:55) with a flow rate was kept 
of 1 ml/min for 25 min. Detection of radioactive peaks was 
accomplished using an on-line Flow-one radiochemical detector 
(Radiomatic Instruments Co., Tampa, FL. Detection of the 
unlabeled 4 • -hydroxymephenytoin authentic standard was 
performed using an on-line multiwavelength UV detector at both 

Oil '^nJ l-^J-k 



211 and 230 ma. 
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(a) statistical analy gpg 

Tolbutamide hydroxylase and mephenytoin hydroxylase 
activities of microsomes prepared from different recombinant 
yeasts were compared by analysis of variance and by Fisher's 
least significant difference test (Carmer et al., Am, Stat. 
Ass. 68:66-74 (1973)). 

1^ Resmt? 

laJ Expressi on of P4 50 2C cDNAs in veast 

Western blot analysis confirmed the expression of 
the recombinant human CYP2C proteins in the recombinant yeast 
(Fig. 6) . Antibodies to 2C8 and 2C9 recognized polypeptide 
bands of approximately 50,000 daltons (2C8) and 55,000 daltons 
(2C9) which corresponded in mobility to those of the 
recombinant proteins purified from yeast microsomes. These 
mobilities corresponded to those of the corresponding 2C8 and 
2C9 proteins purified from human liver. 2C19 was recognized 
by antibodies to both the 2C9 and the 2C19 peptides. This 
protein corresponded in mobility (<50,000 daltons) to the 
lowest of three bands in Western blots of human liver 
microsomes probed with antibody to human 2C9. The mobility of 
2C18 was intermediate between that of 2C8^ and 2C19. 
Antibodies to 2C18 and 2C19 peptides were' specific for their 
antigen; however, antibody to 2C9 cross-reacted strongly with 
2C19 and weakly with 2C8 and 2C18. 

CO difference spectral analysis indicated that the 
recombinant P450 2C proteins were expressed at levels as high 
as 160-250 pmol/mg protein in some yeast microsomal 
preparations. 2C18, 65 (2C9), and 25 (2C9) were expressed at 
levels of 20 to 60 pmol/mg microsomal protein. Initially, lla 
(2C19) was expressed extremely poorly, and the CO difference 
spectrum of the recombinant 2C19 yeast was indistinguishable 
from that of control yeast (<7 pmol/mg protein) . However, 
after repeated transf ections and selection, expression of 2C19 
at ^17 pmol/mg protein was achieved. All of the CYP2C 
proteins were low spin hemoproteins. CYP2C18 appeared to be 
somewhat unstable in yeast microsomes with a large proportion 
(-1/3 to 1/2) of the P450 being converted to P420 in the 
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presence of dithionite and carbon monoxide. None of the other 
recombinant CYP2C proteins showed this lack of stability. 

I^) Optimiza tion of Tolbutamide and S-Mephenvtoin 

Hydroxylase Assays 

Preliminary studies indicated that exogenous P450 
reductase (500 U/50 pmol P450) stimulated metabolism of 
tolbutamide by recombinant 2C9 in yeast microsomes > 10-fold 
and stimulated S-mephenytoin hydroxylase activity 
approximately 2-fold. Activity of the recombinant 2C proteins 
was linear with amount of P450 for 30 minutes through at least 
20 pmol P450 for 2C19 (Fig. 7) and 50 pmol for the other CYP2C 
forms. Cytochrome stimulated S-mephenytoin hydroxylase 
activity of both 2C9 and 2C19 in yeast microsomes and the 
optimal ratio of bg to P450 was approximately 2:1, but it 
generally had no effect or produced a slight inhibition of 
mephenytoin hydroxylase activity of 2C18 (Fig. 8). This 
difference is consistent with the fact that all of the CYP2C 
proteins except 2C18 contain a Ser at position 128 which is a 
recognition site for cAMP protein kinase 

("5Arg-Arg-Phe-Seri2S) (MUller et al., FEES Lett. 187:21-24 
(1985), and this sequence is also thought to be part of a bg 
^binding site (Jansson et al. , Arch. Blochem. Biophys. 259:441- 
448 (1987); 2C18 contains Cys at position 125. 

Mephenytoin 4 • -hydroxylase activity of recombinant 
yeast microsomes was consistently higher in HEPES than 
phosphate buffer, while activity of human liver microsomes was 
~2-fold higher in phosphate buffer (pH 7.4). Therefore, 
recombinant proteins were subsequently assayed in HEPES buffer 
with exogenous reductase and cytochrome bg except for 2C18 
which was tested both with and without cytochrome bg. Human 
liver microsomal activities were assayed in phosphate buffer. 
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IcJ Mephenvtoin hydr oxylase activity of recombinant 

human 2C proteins 

S-mephenytoin 4 ' -hydroxylase activities of yeast 
microsomes containing recombinant hximan CYP2C proteins were 
compared under optimized conditions described above. HPCL 
profiles of the metabolites of S-mephenytoin produced by human 
liver microsomes and recombinant human CYP2C proteins are 
shown in Fig. 9 and the results summarized in Table III, 
Recombinant 2C19 4 ' -hydroxy lated S-mephenytoin at a rate of 
-5 nmol/min/nmol P450 which was an order of one magnitude 
higher than the rate of 4 ' -hydroxylation in h\iman liver 
microsomes (Table III and Fig. 9), The retention time (5- 
6 min) of the 4 ' -hydroxymephenytoin metabolite was identical 
to that of the authentic unlabeled standard. 2C19 also 
produced small quantities of two unknown metabolites eluted at 
3-4 and 7-8 min. These unknown metabolites were also produced 
by liver microsomes, and the metabolite with the shorter 
retention time was the principal metabolite produced by 2C8. 
Parent S-mephenytoin eluted at 14-15 min. followed by the 
unknown impurity which eluted at 16-17 min. Similar retention 
times were observed for R-mephenytoin and its metabolites. 

The rate of 4 •-hydroxymephenytoin formation by 2C19. 
was at least 100-fold higher than that of 2C9 (both alleles),' 
2C18 (both alleles) and 2C8 (Table III). The rate of 4»- 
hydroxylation of S-mephenytoin by 2C8 appeared to be lower 
than that of 2C9 (0,02 nmol/min/nmol). The 4 '-hydroxylation 
of mephenytoin by 2C19 was stereospecif ic; the rate of S- 
hydroxylation was at least 30-fold higher than that of R- 
hydroxylation (Table III). In contrast, the 4 • -hydroxylation 
of mephenytoin by the other human CYP2C proteins did not 
appear to be stereospecif ic. 
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TABLE III 

S-Mephenytoin 4 ' -Hydroxylase Activities in 
Recombinant Human CYP2C Yeast Microsomes 



Microsomes 



Mephenytoin 4 ' -Hydroxylase Activity 
nmol/min/nmol P450 
S R 



R/S Ratio 



Controls 




0.028 


± 


0 


.001 


0 


.024 


± 


0 


.003 


2C9-Ile359 


(65) 


0.043 


± 


0 


.000 


0 


.041 


± 


0 


.005 


2C9-Leu^59 


(25) 


0,031 


± 


0, 


.009 


0 


.040 


± 


0 


.01 


2C8 




0.037 


± 


0. 


.001 


0 


.016 


± 


0, 


.001 


2C18-Thr^^^ 


(29c) + b5 


0,042 


± 


0, 


.004 


0, 


.054 


± 


0. 


.003^ 


2C18-Thr^^^ 


(29c) , no b5 


0,034 


± 


0. 


008 












2C18-Met^Q5 


(6b) 


0.023 


± 


0. 


004 


■ 0. 


,019 


± 


0. 


005 


2C19 (lia) 




4.6 


± 


0. 


3a,b,d 


0. 


014 


± 


0. 


02^ 


Human liver 


microsomes HBl 6 


0.283 


± 


0. 


037a, c,d 


0. 


117 


± 


0. 


017^' 



0.9 
0.9 

0.4 
1.3 

0.9 

0.03 

0.4 



S-Mephenytoin hydroxylase assayed as described in Methods Reartinn 
0.1 M HEPES buffer (pH 1 V) ^All'. radioactive substrate in 

0 n^^'In^T ^^^"ificancly higher than that of control yeast microsomes P < 
0.05. Analysis of variance and Fisher's Least Significant dTfSence test' 

^ 2C19 activity significantly higher than activities of all oth^r 
recombinant CYP2C proteins or human liver mic^isomes! ? < J'o5 

excer^cisT" f o^ol!""" ^^^nlficantly higher than recombinant microsomes 
actfiftiei?^p'<'*o"r^"" I^-Mephenytoin hydroxylase 
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Recombinant CYP2C proteins were purified from yeast 
laicrosomes and their ability to 4 ' -hydroxylate the S- and R- 
enantiomers of mephenytoin were also examined in a 
reconstituted system (Table IV) , 2C19 had similar turnover 
5 numbers for S-mephenytoin 4 • -hydroxylation in the 

reconstituted system and in recombinant yeast microsomes 
fortified with reductase. This turnover number was at least 
10-times higher than that of human liver microsomes, and it 
was 50-100 times higher than that of recombinant 2C9 , 2C18 or 

10 2C8, The turnover number of recombinant 2C9 was -100 times 

higher than the activity of a preparation of 2C9 purified from 
human liver, 4 • -hydroxylation of mephenytoin by 2C19 was 
stereospecif ic for the S-enantiomer , while metabolism by 2C9 
was not stereospecif ic. Surprisingly, 2C18 appeared to be 

15 stereoselective for the R-enantiomer of mephenytoin. The 

turnover number of 2C19 for S-mephenytoin 4 • -hydroxylase was 
also -3 0 times higher than the turnover numbers reported for a 
preparation P450j^ purified from human liver by Srivastava et 
al., Mol. Pharmacol. 40:69-79 (1991) (0.21 nmol/min/nmol 

20 P450) . 

Although 2C9 exhibits poor catalytic activity toward 
S-mephenytoin, this cytochrome appears to be the principal 
tolbutamide hydroxylase (Table IV and v| . The turnover 
numbers for hydroxylation of tolbutamide by the purified 

25 recombinant 2C9 were somewhat lower than those of 2C9 purified 
form human liver in the absence of exogenous reductase. The 
Ile^^^ allele of 2C9 had a 3-fold higher turnover number for 
tolbutamide than the Leu*^^^ allele when activity of the 
recombinant microsomes were adjusted for P450 content 

3 0 (Table V) . 2C19 also appeared to metabolize tolbuteunide at a 
rate comparable to that of 2C9, although this rate was 
difficult to estimate due to the low specific content of P450 
in the recombinant 2C19 yeast clone available at the time of 
these assays. The two alleles of 2C18 exhibited lower 

3 5 tolbutamide hydroxylase activity than 2C9 in recombinant yeast 
microsomes . 



wo 95/30766 



PCT/US95/05744 



78 



TABLE V 



Tolbutamide Hydroxylase Activities of 
Recombinant Human CYP2C Yeast Microsomes 



P450 Content Tolbutamide Hydroxylase Activity 

(pmol/mg) (nmol/min/mg protein) {nmol/min/nmol P450) 



Control Yeast 


<5 


0.3 


± 


0 


.01 








2C9-Ile^^^ (65) 


55 


169.8 


± 


7 


4a, b 


3.4 ± 


0. 


.15 


2C9-Leu^5^ (25) 


20 


14.8 


± 


0 


3a, c 


0.99 ± 


0. 


.02 


208 


80 


8.5 


± 


0 


.2^ 


0.11 ± 


0. 


,003 


2C18-Asp^Thr^Q5 (29c-la) 


53 


9.3 


± 


0 


.73 


0.19 ± 


0. 


,02 


2Cl8-A3p^Met^S5 (6^-9) 


34 


11.1 


± 


1 , 


.2^ 


0.37 ± 


0. 


04 


2C19 (lla-3) 


<7 


18.4 


± 


2 . 


4a, d 


ND 






UC89 36 Human Liver 


















Microsomes 


227 


116 


± 


0, 


.8^ 


2.3 ± 


0. 


02 



Tolbutamide hydroxylase activities measured as described in methods. 
Reaction mixtures contained 1 mg yeast microsomal protein or 0.2 mg UC8936 
M ®^ microsomal protein (50 pmol P450) . Purified P450 reductase 

(1,000 units) was included in reactions with yeast microsomes but not human 
microsomes. Values were the means ± SE. ND-Not calculated due to low 
specific content of 2C19 in yeast in this experiment. 



^ Significantly higher than control yeast microsomes, P<0.05. Pairwise 
comparisons using Fisher's Least Significant Difference test. 

^ Clone 65 significantly higher than all other clones (P<0.0001) . 

^ Clone 25 significantly greater than 2C8 (P<0.0005). 

^ Clone lia significantly higher than 2C8 (P<0.0001) . 



The data show that CYP2C19 stereospecif ically 
hydroxylates S-mephenytoin at the 4'- position at a rate which 
is at least 10 times higher than the rate in human liver 
microsomes. This is the first example of a human CYP protein 
which metabolizes S-mephenytoin with a turnover number 
appreciably higher than that of human liver microsomes. Other 
2C proteins showed a 100-fold reduced activity relative to 
2C19. One of the 2C9 variants tested (Ile^^^) is identical to 
that reported by Yasumori et al., supra to show a low level of 
s-mephenytoin 4 • -hydroxylase activity. The low rate of 4'- 
hydroxylation of S-mephenytoin by 2C9 detected in the present 
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Study with high specific activity ^^C-labeled S-mephenytoin 
undoubtedly explains the conflicting reports from various 
laboratories concerning the ability of this cytochrome to 
metabolize mephenytoin (Yasumori et al., supra; Srivastava et 
al., supra; Railing et al., supra). 

(d) Compar isons of Immunoblot Analysis of CYP2C 

Proteins in Human Livers with T.ivpr Microsomal S-Mephenvtoin 
4 ' -Hydroxylase Activities 

Microsomes from 16 hiiman liver donor samples 
previously assayed for S- and R-mephenytoin 4 • -hydroxylase 
activities were analyzed for CYP2C proteins by Western blot 
analysis (Fig. 10) using an antibody to 2C8 and a polyclonal 
antibody to 2C9 and 2C19. Both 2C18 and 2C19 have mobilities 
similar to that of the low molecular weight band recognized ii 
human microsomes by most antibodies to 2C9. However, an 
antibody to a 2C19 peptide was specific for 2C19. 2C18 could 
not be detected in human liver samples using a peptide 
antibody to 2C18 (~5 pmol detection limit) , indicating that 
this polypeptide is expressed poorly (<50 pmol/mg) . 

The 2C19 content of liver microsomes was consistent 
with their S-mephenytoin 4 • -hydroxylase activities (Fig. lO) . 
In particular, samples 129 and 130 had extremely low S- I 
mephenytoin 4 • -hydroxylase values, low S/R ratios, and 2C19 
appeared to be essentially absent in these microsomal samples. 
Densitometric analysis of immunoblots revealed that 2C19 
content of the 16 human liver microsomes correlated 
significantly with S-mephenytoin 4 • -hydroxylase activity 
(r=0.7l8, P<0.005) (Fig. 11), but that the content of 2C9 did 
not correlate with this catalytic activity (r=0.49, P>0.05). 
There was also a significant correlation between 2C8 content 
and S-mephenytoin 4 • -hydroxylase activity (r=0.82, P<0.0001). 
However, this correlation was probably fortuitous, because 2C8 
shows very low S-mephenytoin 4 ' -hydroxylase activity either in 
recombinant form or when purified from human liver. 
Alternatively, the correlation may indicate an indirect 
regulatory role for 2C8 in controlling S-mephenytoin 4'- 
hydroxylase activity. 
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(&) Sequences of 2C9 and 2C18 mRNAs in Livers with 
High or Low S-mephenvtoin 4 ' -Hydroxylase Activities 

2C18 and 2C9 mRNAs from six of the above livers were 
5 amplified by PGR and directly sequenced through areas of known 
allelic variation to determine whether there was a 
relationship between S-mephenytoin 4 ' -hydroxylase activity and 
the presence of a particular allelic variant (Table VI) . When 
the total 2C18 PGR products were sequenced, the two 

10 individuals with the highest S-mephenytoin hydroxylase 
activity were homozygous for Thr^®^ (AGG) . Of the two 
individuals with the lowest activity, one was homozygous for 
Met^^^, and one was heterozygous for Thr/Met^^^ (AG/TG) . Two 
individuals with intermediate activity were also homozygous 

15 for Thr^^^. Similarly, when 2G9 mRNA from these same 

individuals was amplified and sequenced through known allelic 
variations, sample 108 (low S-mephenytoin 4 ' -hydroxylase 
activity) was heterozygous at G/T^^*^ (coding for Gys/Arg^'*^) , 
while the other five individuals were homozygous for G^^^ 

20 (Arg^^^). Sequencing samples through bases 1072-1077, all 
samples except for 106 (high activity) read ^°'^^TACATT^°'^"^, 
coding for Tyr^^^Ile^^^ . Sample 106 read TACA/CTT indicating 
that it was heterozygous for Ile/Leu^^^. These data indicate 
that there is no relationship between S-mephenytoin 4'- 

25 hydroxylase activity of human liver microsomes and the 

identity of the allelic variants of 2C18 (Thr/Met^®^) or 2G9 
(Arg/Gys^^^, Tyr/Cys^^^ Ile/Leu^59j these tissues. 
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TABLE VI 

Alleles in Human Livers with Varying S-Mephenytoin 
4 ' -Hydroxylase Phenotypes 

S-MPOHaae 



Pheno- 
type 


nmol/ 
min/mg 


Liver 
donor 


2C18 
allele 


2C9 allele 








High 


0.286 


106 


Thr^QS 


Argl44 


His^*^^ 


^358 


Ile/Leu-^ 


High 


0.351 


115 


^j^j.38 5 


Argi^^ 


His276 


iyx'358 


Xle359 


Inter- 
mediate 


0.070 


118 


Thr^aS 


Argl44 


His276 


Tyr358 


Leu359 


Inter- 
mediate 


0,081 


123 




Argl44 


Hi3276 


«py^358 


Ile359 


Low 


0.051 


108 


Thr/Met^85 


Arg/Cysl44 


His276 


^358 


ne359 


Low 


0,025 


129 


Met/Met^^^ 


Argl44 


His276 


^^358 


Xie359 



4. Conclusion 

These results show that 2 CI 9 has a turnover number 
for the 4 • -hydroxylation of S-mephenytoin about loo-fold 
higher than that of 2C9, 2C18, or 2C8. 2C19 hydroxylation was 
stereospecific for the S- enantiomer. The hepatic content of 
2C19 in 16 liver microsonal samples correlated with their S- 
mephenytoin 4 • -hydroxylase activities. 2C9 appeared to be the 
primary tolbutamide hydroxylase, although 2C19 may also 
contribute to this catalytic activity. The identity of the 
allelic variant of 2C9 or 2C18 did not influence S-mephenytoin 
4 '-hydroxylase activity. These data strongly indicate that 
2C19 is the key determinant of S-mephenytoin 4 ' -hydroxylase 
activity in human liver. 

Example 6; — Diagnostic Assays for Detecting Individuals 
Deficient in S-Menhenvtoin 4 ' -Hvdroyyiase Activity 

Individuals deficient in S-mephenytoin 4'- 
hydroxylase activity are identified by comparing analysis of 
their genomic or cDNA encoding 2C19. 



wo 95/30766 



PCT/US95/05744 



82 

fa) Analysis of full-lenath cDNA 
Liver microsomes were prepared by standard 
differential centrif ugation methods (2) from human liver 
samples previously characterized as varying markedly in S- 
mephenytoin 4 • -hydroxylase in vitro. Total liver RNA was 
isolated from the liver samples with trireagent (Molecular 
Research Center, Inc.) and reversed transcribed using random 
hexamers as 3» primers. Overlapping CYP2C1S cDNA fragments 
from five human liver samples that showed poor metabolism of 
S-mephenytoin in vitro were amplified by the polymerase chain 
reaction (PGR) . PGR was performed on an aliquot of the cDNA 
in 1 X PGR buffer (67 mM Tris-HGl pH 8.8, 17 mM (m^)2S0^, 10 
mM ^-mercaptoGthanol, 7 fjK EDTA, 0.2 mg bovine serum 
albumin/ml), 50 dATP, dCTP, dGTP and dTTP, 0.25 fM of both 
PGR primers, 2.5 U AmpliTaq DNA polymerase (Perkin Elmer 
cetus) and 1.0 mM MgCl^. The PGR conditions were: initial 
denaturation at 94°G for 3 min; 35 cycles consisting of: 
denaturation at 94°C for 30 sec, annealing at 53°G for 30 sec 
and extension at 72°C for 30 sec; final extension at 72°C for 
10 min; using a Perkin Elmer thermocycler . PGR products (20 
fil) were analyzed on 3% agarose gels stained with ethidium 

bromide. ^ 

I 

The PGR fragments were purified using Microcon 
filters (Amicon Inc.) and used in the cycle sequencing 
reaction employing fluorescence-tagged dye terminators (PRISM, 
Applied Biosystems)ed and sequenced. One partial cyP2C19 cDNA 
was isolated which exhibited aberrant splicing of exon 5 (Fig. 
12). This cDNA was missing the initial 40 bases of exon 5, 
and was also missing a Smal site (Fig. 12). This deletion 
would be predicted to produce an early stop codon resulting in 
a truncated defective protein. 

lb) Rapid Assay for Identifying 4 0 bp Deletion -in 

cDNA 

The analysis of full-length cDNAs identified a 40 bp 
deletion as a likely cause of S-mephenytoin 4 ' -hydroxylase 
activity deficiency, A rapid assay was therefore devised to 
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analyze the specific region of a 2C19 cDNA molecule spanning 
the 4 0 bp deletion. 

Specific PGR primers were designed to amplify the 
region of the CYP2C1S cDNA spanning the deletion (Figs. 12 and 
13) . mRNA from 13 hvunan livers previously characterized for 
extensive or poor metabolism of S-mephenytoin in vitro were 
reverse transcribed and amplified by PGR. Liver samples with 
the highest S-mephenytoin hydroxylase activity contained only 
the normally spliced mRNA. By contrast, sample 35 (a probable 
poor metabolizer) produced an amplification product containing 
the 40 bp deletion. Samples with intermediate S-mephenytoin 
4 '-hydroxylase activity and low amounts of GYP2C19 protein 
exhibited both the normal 2G19 cDNA and 2C19 cDNA containing 
the 4 0 bp deletion. 

l£} Genomic Sequencing of 2C19 

Because human tissue samples containing genomic 2C19 
DNA are much more easily obtained than samples containing 2C19 
mRNA, it is preferable to diagnose a polymoiTDhic defect from 
genomic DNA. Genomic DNA was isolated from the blood of human 
volunteers previously characterized as poor or extensive 
metabolizers of S-mephenytoin in vivo. The in vivo phenotype 
of most Swiss subjects was based on a hydroxylation index, 
with a value above 5.6 identifying a poor metabolizer (Kupfer 
et al., Eur. J. Clin. Phannacol. 26:753-759 (1984)). The in 
vivo phenotype of American, Oriental and one Swiss subject was 
based on the urinary S/R ratio (Wedlund et al. , Clin. 
Pharmacol. Ther. 36:773-780 (i984))~a poor metabolizer (PM) 
being defined as having a ratio > 0.95. An extensive 
metabolizer is defined as having a ratio < 0.8. An 
intermediate phenotype (IM) has been previously described with 
the extent of 4 • -hydroxylation being greater than in PMS but 
with the rate of metabolite formation being slower than EMS 
(Arns et al., Pharmacologist 32:140 (1990)). 

It was believed that the 40 bp deletion identified 
in 2C19 cDNA occurred in exon 5, near the border with intron 4 
based on a comparison of the gene structure of CYP2C9 and 
CYP2C18 (de Morais et al., supra). Thus, a segment of genomic 
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2C19 DNA across the intron 4/exon 5 border was amplified to 
identify the corresponding genetic defect in genomic DNA. in 
the initial assays, the untranslated regions of the genomic 
2C19 sequence were not known. However, intron 4 primers could 
be designed based on the corresponding sequences from CyP2C9, 
which are expected to show about 95% sequence identity based 
on comparison with partial genomic sequences of 2C19. The 
primer for exon 5 was based on the cDNA sequence of CYP2C19 
(see Example 1) . The amplified DNA fragment was found to have 
the same size in both poor and extensive metabolizers. 
However, on restriction analysis, it was found that only the 
fragment from extensive metabolizers could be digested with 
Smal. The amplified DNA fragment was sequenced in extensive 
and poor metabolizers. 

Provision of genomic 2C19 DNA sequence in the intron 
4 region, allowed the design of a specific intron primer 
exhibiting perfect complementarity to the 2C19 DNA sequence in 
subsequent experiments. The forward PCR primer from intron 4 
was 5'-AATTACAACCAGAGCTTGGC-3 ' and the reverse primer from 
exon 5 was 5 '-TATCACTTTCCATAAAAGCAAG-3 ' . The forward primer 
anneals 81 bp upstream of the intron 4/exon 5 junction. PGR 
conditions were as for amplification of cDNA, except that 
reactions used 200 ng of genomic DNA and an initial 
denaturation at 96°C for 5 min. PCR products were restricted 
with Smal in the PCR buffer, without purification. Uncut 
products had the same size (168 bp) in all samples. Digested 
PCR products were analyzed on 4% agarose gels stained with 
ethidium bromide. 

DNA from 18 unrelated Caucasian extensive 
metabolizers and 10 unrelated Caucasian poor metabolizers was 
analyzed by this strategy. (Fig 14C) . All extensive 
metabolizers were either homozygous or heterozygous for the 
normal CYP2C19 gene, defined here as CYP2C19^^ (wild type). 
Among the 10 poor metabolizers, 7 were homozygous for the 
defective gene, defined as CrP2Cl5„(poor mephenytoin 
hydroxy lat ion) . One poor metabolizer was heterozygous 
(CYP2C19^/CYP2C19„) , and two were homozygous 
(CYP2C19^/CYP2C19^) , indicating that CYP2C19„ accounted for 
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15 of 20 alleles tested (75%) in Caucasian poor metabolizers. 
The presence of 5 CYP2C19^^ alleles in poor metabolizers 
suggests that additional mutations may exist in the Caucasian 
population, but that 2C19j^ represents the predominant defect. 
5 Segments of DNA spanning the intron 4/exon 5 

boundary were also amplified from 17 unrelated Oriental 
subjects. Figure 14D shows that 10/17 Oriental poor 
metabolizers are homozygous for CYPZCIB^^ and CYP2C19^ 
accounts for 25 of 34 alleles (74%) in Oriental poor 

10 metabolizers. All 12 unrelated Oriental extensive 

metabolizers were either homozygous or heterozygous for the 
CYP2C1S^^ gene. Thus, the major mutation responsible for the 
poor metabolizer phenotype in Oriental is identical to that 
found in Caucasians. 

The inheritance of CYP2C19^ in one Oriental family 
previously characterized with respect to the PM trait was also 
examined. Figure 14B shows that the poor metabolizer proband 
(arrow) and two other related poor metabolizers are homozygous 
for CYP2C19^. Two individuals identified earlier as obligate 

20 heterozygotes (family C) (Ward et al., Clin, Pharmacol. Thar, 
42:96-99 (1987)) were indeed found to be CYP2C19j^/CYP2C19^. 
Thus, the inheritance of the genotype agrees with the 
M^lndelian autosomal-recessive inheritance of phenotype. 

The DNA of three individuals (CYP2C19^/CYP2C19^^, 

25 CYP2C19JCYP2C19^, and CYP2C19^/CYP2C19^) was amplified as 
described above and sequenced directly using an automated 
sequencer (Applied Biosystems) (Fig. 15). Surprisingly, the 
sequence of intron 4 of the defective gene was identical to 
that of the normal gene. The only alteration found in 

3 0 CYP2C19j^ was a G-*A change in exon 5 corresponding to position 
681 of the cDNA . This mutation introduces a cryptic splice 
site in this exon. This mutation also abolishes a Smal site at 
this position (CCCGGG ^ CCCAGG) . The cryptic splice site 
shows slightly greater sequence identity to the consensus 

35 sequence for mammalian splice sites (Green, Ann. Rev. Cell 

Biol, 7:559-599 (1991)) than the normal splice site. A second 
potential branch point is also seen near the cryptic splice 
site. Surprisingly, the cDNA sequences from CYP2C8 and 
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CYP2C18 have a comparable potential cryptic splice site at the 
same point in exon 5 to that of CYP2C19„, but the presence of 
the full-length 2C8 protein on immunoblots of human liver 
microsomes indicates that the majority of this protein is 
spliced correctly. 

Three of the samples tested by cDNA analysis in 
Figure 13 (sample 12, predicted genotype CYP2C19^j./CYP2C19^) , 
sample 21, predicted genotype CYP2C19„^/cyP2C19j„, and sample 
35, predicted genotype CYP2C19^/CYP2C19„) were retested by 
genomic analysis. Perfect agreement was observed. The 
cryptic splice site appeared to be used exclusively in sample 
35 which is a predicted poor metabolizer and also in liver RNA 
of an additional CYP2C19^/CYP2C19^ individual. The selection 
of the cryptic splice site results in the absence of CYP2C19 
in liver microsomes from poor metabolizers (Fig. 13) . 

(d) Conclusion 

The principal genetic defect (CYP2C19J which is 
responsible for the poor metabolism of S-mephenytoin is a G-A 
mutation at position 681 of the coding sequence (within exon 
5) . CYP2Cl9j„ accounts for 75% of the defective alleles in 
both Caucasian and Oriental poor metabolizers. The single 
base change generates a cryptic internal splice site, which is 
used exclusively to produce an aberrantly spliced mRNA 
containing a 40 bp deletion. The CYP2C19 protein is virtually 
absent in livers of poor metabolizers. The mutation at 
position 681 is easily detected by pgr amplification of a 
segment of genomic 2C19 DNA spanning the mutation. 

Example 7: — Identification and piaanQstjc Assay for a Seconri 
Polvmorphism fdesionated 636^ in 2C19 

A second mutation designated the 636 polymorphism 
(also Jcnown as CYP2C19^) has identified. Genomic DNA from a 
Oriental poor metabolizer (subject 43 in Example 6) was 
amplified by PGR using a forward primer complementary to the 
antisense strand of intron 3 extending from bases -79 to -55 
and a reverse primer complementary to the sense strand 
extending from 79-89 bases into intron 4 (forward primer 5'- 
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rATTATCTGTTAACTAATATGA-3 ' (SEQ. ID. Mo. 57) and reverse primer 
5'- ACTTCAGGGCTTGGTCAATA-3' (SEQ. ID. No. 58). These primers 
were selected to amplify a 329 base pair product containing 
all of exon 4 and the surrounding intron/exon junctions. See 
5 Figure 17. sequencing of the PGR products with an Applied 

Biosystems sequencer identified two mutations in exon 4 of the 
Oriental poor metabolizer. a second nutation at nucleotide 
63 6 entailed a G-A transition at the nucleotide level and the 
conversion of a tryptophan codon at position 212 (TGG-TGA) to 

10 a premature stop codon. This change would result in a 

truncated 211 amino acid polypeptide containing only the first 
4 axons, which would not contain the heme-binding region and 
would be inactive. The change at position 636 also destroys a 
BamHl site ( GGATCC-K3AATCC ) (or its isoschizomer BstI) at 

15 positions 635-640. 

A PGR test was developed using the primers described 
above to amplify a 329 base pair product. The PGR product 
from the wild-type DNA from extensive metabolizers was cut 
with BamHI to yield two expected fragments with sizes of 233 
base pairs and 96 base pairs (Fig. 18). The PGR fragment 
amplified from the individual with the 63 6 mutation . (i . e. , 
Oriental subject #43) could not be restricted, indicating that 
he was homozygous for j|the 635 nutation. Genotyping of 7 
Oriental poor metabolizers whose phenotype could not be 
explained by the previous 681 mutation indicated that subjects 
41 and 43 were homozygous for the 636 mutation, while subjects 
36, 48, 11, 69, and 100, were heterozygous for bearing both 
636 and 681 mutant alleles. The DNA in homozygous 636 mutant 
subjects 41 and 43 was not out by BamHI. The DNA in the 
heterozygotes yielded three bands at 327, 232, and 95 bp. The 
DNA from these heterozygotes also yielded three bands from 
smal site (169, 120, and 49 bp) indicating they were also 
heterozygous for the 681 base pair mutation named CYP2C19j„) . 
These data show that the 636 and 681 mutations completely" 
35 account for the low phenotypes in all of the Oriental poor 

metabolizers of S-mephenytoin tested (17 individuals with 34 
alleles) . 



20 
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Three Caucasian poor metabolizers who were not 
homozygous for the 681 mutation were also genotyped for the 
636 mutation. These were subjects JOBl, 502 and 503. One of 
these individuals (JOBl) was heterozygous for the 681 mutation 
while the other two did not contain the 681 mutation in either 
allele. None of these individuals exhibited a 63 6 mutation. 
Thus, there is probably at least one additional polymorphism 
in 2C19 in Caucasians. 

In summary, the 681 and 636 mutations explain 100% 
of Oriental poor metabolizers, and the 681 mutation alone 
accounts for about 75% of Caucasian poor metobilizers. 

While the foregoing invention has been described in 
some detail for purposes of clarity and understanding, it will 
be clear to one skilled in the art from a reading of this 
disclosure that various changes in form and detail can be made 
without departing from the true scope of the invention. All 
publications and patent documents cited in this application 
are incorporated by reference in their entirety for all 
purposes to the same extent as if each individual publication 
or patent document were so individually denoted. 
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(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 0 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

Met Asp Pro Phe Val Val Leu Val Leu Cys Leu Ser Cys Leu Leu Leu 
^5 10 15 

Leu Ser He Trp Arg Gin Ser Ser Gly Arg Gly Lys Leu Pro Pro Gly 
20 .25 30 

Pro Thr Pro Leu Pro Val He Gly Asn He Leu Gin He Asp He Lys 
35 40 45 

Asp Val Ser Lys Ser Leu Thr Asn Leu Ser Lys He Tyr Gly Pro Val 
SO 55 60 

Phe Thr Leu Tyr Phe Gly Leu Glu Arg Met Val Val Leu His Gly Tyr 
" 70 75 ^ 3^ 

Glu Val Val Lys Glu Ala Leu He Asp Leu Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly His Phe Pro Leu Ala Glu Arg Ala Asn Arg Gly Phe Gly He 
100 105 110 

Val Phe Ser Asn Gly Lys Arg Trp Lys Glu He Arg Arg Phe Ser Leu 

120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Ara 
130 135 140 ^ 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Lys 
150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
1^5 170 175 

Val He Cys Ser He He Phe Gin Lys Arg Phe Asp Tyr Lys Asp Gin 
130 185 190 

Gin Phe Leu Asn Leu Met Glu Lys Leu Asn Glu Asn He Arg He Val 
195 200 205 

Ser Thr Pro Trp He Gin He Cys Asn Asn Phe Pro Thr He He Asp 
210 215 220 

Tyr Phe Pro Gly Thr His Asn Lys Leu Leu Lys Asn Leu Ala Phe Met 
230 235 240 

Glu Ser Asp He Leu Glu Lys Val Lys Glu His Gin Glu Ser Met Asp 
245 250 255 

He Asn Asn Pro Arg Asp Phe He Asp Cys Phe Leu He Lys Met Glu 
260 265 270 

Lys Glu Lys Gin Asn Gin Gin Ser Glu Phe Thr He Glu Asn Leu Val 
275 280 285 

He Thr Ala Ala Asp Leu Leu Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 
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Thr Leu Arg Tyr Ala Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
■^"^ 310 

Ala Lys val Gin Glu Glu He Glu Arg Val He Gly Arg Asn Arg Ser 
"5 330 335 

Pro Cys Met Gin Asp Arg Gly His Met Pro Tyr Thr Asp Ala Val Val 
3"*° 345 350 

His Glu val Gin Arg Tyx He Asp Leu lie Pro Thr Ser Leu Pro His' 
J== 360 365 

Ala val Thr Cys Asp Val Lys Phe Arg Asn Tyr Leu He Pro Lys Gly 

375 380 



Thr Thr He Leu Thr Ser Leu Thr Ser Val Leu His Asp Asn Lys Glu 

395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro Arg His Phe Leu Asp Glu Gly 

410 415 

Gly Asn Phe Lys Lys Ser Asn Tyr Phe Met Pro Phe Ser Ala Gly Lys 
'^20 425 430 

Arg lie Cys Val Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 

440 

Leu Thr Phe He Leu Gin Asn Phe Asn Leu Lys Ser Leu He Asp Pro 

^=5 460 

Lys ASP Leu Asp Thr Thr Pro Val Val Asn Gly Phe Ala Ser Val Pro 

'*75 480 

Pro Phe Tyr Gin Leu Cys Phe He Pro Val 

490 
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(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 174 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



CTTCAATGGA 


TCCTTTTGTG 


GTCCTTGTGC 


TCTGTCTCTC 


ATGTTTGCTT 


CTCCTTTCAA 


60 


TCTGGAGACA 


GAGCTCTGGG 


AGAGGAAAAC 


TCCCTCCTGG 


CCCCACTCCT 


CTCCCAGTGA 


120 


TTGGAAATAT 


CCTACAGATA 


GATATTAAGG 


ATGTCAGCAA 


ATCCTTAACC 


AATCTCTCAA 


180 


AAATCTATGG 


CCCTGTGTTC 


ACTCTGTATT 


TTGGCCTGGA 


ACGCATGGTG 


GTGCTGCATG 


240 


GATATGAAGT 


GGTGAAGGAA 


GCCCTGATTG 


ATCTTGGAGA 


GGAGTTTTCT 


GGAAGAGGCC 


300 


ATTTCCCACT 


GGCTGAAAGA 


GCTAACAGAG 




CGTTTTCAGC 


AATGGAAAGA 


"3 C 

Jo 0 


GATGGAAGGA 


GATCCGGCGT 


TTCTCCCTCA 


TGACGCTGCG 


GAATTTTGGG 


ATGGGGAAGA 


420 


GGAGCATTGA 


GGACCGTGTT 


CAAGAGGAAG 


CCCGCTGCCT 


TGTGGAGGAG 


TTGAGAAAAA 


480 


CCAAGGCrrC 


ACCCTGTGAT 


CCCACTTTCA 


TCCTGGGCTG 


TGCTCCCTGC 


AATGTGATCT 


540 


GCTCCATTAT 


TTTCCAGAAA 


CGTTTCGATT 


ATAAAGATCA 


GCAATTTCTT AACTTGATGG 


600 


AAAAATTGAA 


TGAAAACATC 


AGGATTGTAA 


GCACCCCCTG 


GATCCAGATA 


TGCAATAATT 


660 


TTCCCACTAT 


CATTGATTAT 


TTCCCGGGAA 


CCCATAACAA 


ATTACTTAAA AACCTTGCTT 


720 


TTATGGAAAG . 


|TGATAmTG 


GAGAAAGTAA 


AAGAACACCA 


AGAATCGATG 


GACATCAACA 


780 


ACCCTCGGGA 


CTTTATTGAT 


TGCTTCCTGA 


TCAAAATGGA 


GAAGGAAAAG 


CAAAACCAAC 


840 


AGTCTGAATT 


CACTATTGAA 


AACTTGGTAA 


TCACTGCAGC 


TGACTTACTT 


GGAGCTGGGA 


900 


CAGAGACAAC 


AAGCACAACC 


CTGAGATATG 


CTCTCCTTCT 


CCTGCTGAAG 


CACCCAGAGG 


960 


TCACAGCTAA 


AGTCCAGGAA 


GAGATTGAAC 


GTGTCATTGG 


CAGAAACCGG AGCCCCTGCA 


1020 


TGCAGGACAG 


GGGCCACATG 


CCCTACACAG 


ATGCTGTGGT 


GCACGAGGTC 


CAGAGATACA 


1080 


TCGACCTCAT 


CCCCACCAGC 


CTGCCCCATG 


CAGTGACCTG 


TGACGTTAAA 


TTCAGAAACT 


1140 
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ACCTCATTCC CAAGGGCACA ACCATATTAA CTTCCCTCAC TTCTGTGCTA CATGACAACA 1200 

AAGAATTTCC CAACCCAGAG ATGTTTGACC CTCGTCACTT TCTGGATGAA GGTGGAAATT 1260 

TTAAGAAAAG TAACTACTTC ATGCCTTTCT CAGCAGGAAA ACGGATTTGT GTGGGAGAGG 1320 

GCCTGGCCCG CATGGAGCTG TTTTTATTCC TGACCTTCAT TTTACAGAAC TTTAACCTGA 1380 

AATCTCTGAT TGACCCAAAG GACCTTGACA CAACTCCTGT TGTCAATGGA TTTGCTTCTG 144 0 

TCCCGCCCTT CTATCAGCTG TGCTTCATTC CTGTCTGAAG AAGCACAGAT GGTCTGGCTG 1500 

CTCCTGTGCT GTCCCTGCAG CTCTCTTTCC TCTGGTCCAA ATTTCACTAT CTGTGATGCT 1560 

TCTTCTGACC CGTCATCTCA CATTTTCCCT TCCCCCAAGA TCTAGTGAAC ATTCAGCCTC 1620 

CATTAAAAAA GTTTCACTGT GCAAATATAT CTGCTATTCC CCATACTCTA TAATAGTTAC 1680 

ATTGAGTGCC ACATAATGCT GATACTTGTC TAATGTTGAG TTATTAACAT ArrATTATTA 174 0 
AATAGA 



1746 



(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 490 amino acida 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 

Met Asp Ser Leu Val Val Leu Val Leu Cys Leu Ser Cys Leu Leu Leu 
^ S 10 15 

Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Ly3 Leu Pro Pro Gly 
20 25 30 

Pro Thr Pro Leu Pro Val He Gly Asn He Leu Gin He Gly He Lys 
35 40 45 

Asp He Ser Lys Ser Leu Thr Asn Leu Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Leu Tyr Phe Gly Leu Lys Pro He Val Val Leu His Gly Tvr 
«5 70 75 

Glu Ala Val Lys Glu Ala Leu He Asp Leu Gly Glu Glu Phe Ser Glv 

85 90 95 ^ 

Arg Gly He Phe Pro Leu Ala Glu Arg Ala Asn Arg Gly Phe Glv He 
100 105 110 

Val Phe Ser Asn Gly Lys Lys Trp Lys Glu He Arg Arg Phe Ser Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Lys 

150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 
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Val He Cys Ser He He Phe His Lys Arg Phe Asp Tyr Lys Asp Gin 
IflO 185 190 

Gin Phe Leu Asn Leu Met Glu Lys Leu Asn Glu Asn lie Lys He Leu 
195 200 205 

Ser Ser Pro Trp He Gin He Cys Asn Asn Phe Ser Pro He He Asp 
210 215 220 

Tyr Phe Pro Gly Thr His Asn Lys Leu Leu Lys Asn Val Ala Phe Met 
225 230 235 240 

Lys Ser Tyr He Leu Glu Lys Val Lys Glu His Gin Glu Ser Met Asp 
245 250 255 

Met Asn Asn Pro Gin Asp Phe He Asp Cys Phe Leu Met Lys Met Glu 
260 265 270 

Lys Glu Lys His Asn Gin Pro Ser Glu Phe Thr He Glu Ser Leu Glu 
275 280 285 

Asn Thr Ala Val Asp Leu Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Ala Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Glu Arg Val He Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu Val Gin Arg Tyr Leu Asp Leu Leu Pro Thr Ser Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp He Lys Phe Arg Asn Tyr Leu He Pro Lys Gly 
370 375 ^ 380 

Thr Thr He Leu He Ser Leu Thr Ser Val Leu His Asp Asn Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro His His Phe Leu Asp Glu Gly 
405 410 415 

Gly Asn Phe Lys Lys Ser Lys Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg He Cys Val Gly Glu Ala Leu Ala Gly Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Ser He Leu Gin Asn Phe Asn Leu Lys Ser Leu Val Asp Pro 
450 455 460 

Lys Asn Leu Asp Thr Thr Pro Val Val Asn Gly Phe Ala Ser Val Pro 
465 470 475 480 

Pro Phe Tyr Gin Leu Cys Phe He Pro Val 
485 490 
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(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1854 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

GAGAAGGCTT CAATGGATTC TCTTGTGGTC CTTGTGCTCT GTCTCTCATG TTTGCTTCTC 60 

CTTTCACTCT GGAGACAGAG CTCTGGGAGA GGAAAACTCC CTCCTGGCCC CACTCCTCTC 120 

CCAGTGATTG GAAATATCCT ACAGATAGGT ATTAAGGACA TCAGCAAATC CTTAACCAAT 180 

CTCTCAAAGG TCTATGGCCC TGTGTTCACT CTGTATTTTG GCCTGAAACC CATAGTGGTG 24 0 

CTGCATGGAT ATGAAGCAGT GAAGGAAGCC CTGATTGATC TTGGAGAGGA GTTTTCTGGA 3 00 

AGAGGCATTT TCCCACTGGC TGAAAGAGCT AACAGAGGAT TTGGAATTGT TTTCAGCAAT 360 

GGAAAGAAAT GGAAGGAGAT CCGGCGTTTC TCCCTCATGA CGCTGCGGAA TTTTGGGATG 420 

GGGAAGAGGA GCATTGAGGA CCGTGTTCAA GAGGAAGCCC GCTGCCTTGT GGAGGAGTTG 480 

AGAAAAACCA AGGCCTCACC CTGTGATCCC ACTTTCATCC TGGGCTGTGC TCCCTGCAAT 540 

GTGATCTGCT CCATTAnTT CCATAAACGT TTTGATTATA AAGATCAGCA ATTTCTTAAC 600 

TTAATGGAAA AGTTGAATGA AAACATCAAG ATTTTGAGCA GCCCCTGGAT CCAGATCTGC 660 

AATAATTTTT CTCCTATCAT TGATTACTTC CCGGGAACTC ACAACAAATT ACTTAAAAAC 720 

GTTGCTTTTA TGAAAAGTTA TATTTTGGAA AAAGTAAAAG AACACCAAGA ATCAATGGAC 780 

ATGAACAACC CTCAGGACTT TATTGATTGC TTCCTGATGA AAATGGAGAA GGAAAAGCAC 840 

AACCAACCAT CTGAATTTAC TATTGAAAGC TTGGAAAACA CTGCAGTTGA CTTGTTTGGA 900 

GCTGGGACAG AGACGACAAG CACAACCCTG AGATATGCTC TCCTTCTCCT GCTGAAGCAC 960 

CCAGAGGTCA CAGCTAAAGT CCAGGAAGAG ATTGAACGTG TGATTGGCAG AAACCGGAGC 1020 

CCCTGCATGC AAGACAGGAG CCACATGCCC TACACAGATG CTGTGGTGCA CGAGGTCCAG 1080 

AGATACCTTG ACCTTCTCCC CACCAGCCTG CCCCATGCAG TGACCTGTGA CATTAAATTC 1140 

AGAAACTATC TCATTCCCAA GGGCACAACC ATATTAATTT CCCTGACTTC TGTGCTACAT 1200 

GACAACAAAG AATTTCCCAA CCCAGAGATG TTTGACCCTC ATCACTTTCT GGATGAAGGT 1260 

GGCAATTTTA AGAAAAGTAA ATACTTCATG CCTTTCTCAG CAGGAAAACG GATTTGTGTG 1320 

GGAGAAGCCC TGGCCGGCAT GGAGCTGTTT TTATTCCTGA CCTCCATTTT ACAGAACTTT 1380 

AACCTGAAAT CTCTGGTTGA CCCAAAGAAC CTTGACACCA CTCCAGTTGT CAATGGTTTT 144 0 

GCCTCTGTGC CGCCCTTCTA CCAGCTGTGC TTCATTCCTG TCTGAAGAAG AGCAGATGGC 1500 

CTGGCTGCTG CTGTGCAGTC CCTGCAGCTC TCTTTCCTCT GGGGCATTAT CCATCTTTCA 1560 

CTATCTGTAA TGCCTTTTCT CACCTGTCAT CTCACATTTT CCCTTCCCTG AAGATCTAGT 1620 

GAACATTCGA CCTTCATTAC GGAGAGTTTC CTATGTTTCA CTGTGCAAAT ATATCTGCTA 158 0 
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TTCTCCATAC TCTGTAACAG TTGCATTGAC TGTCACATAA TGCTCATACT TATCTAATGT 1740 
TGAGTTATTA ATATGTTATT ATTAAATAGA GAAATATGAT TTGTGTATTA TAATTCAAAG 1800 
GCATTTCTTT TCTGCATGTT CTAAATAAAA AGCATTATTA TTTGCTGAAA AAAA 1854 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i] SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:5: 

Met Asp Pro Ala Val Ala Leu Val Leu Cys Leu Ser Cys Leu Phe Leu 
1 5 10 



15 



Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Arg Leu Pro Ser Gly 
20 25 30 

Pro Thr Pro Leu Pro lie He Gly Asn He Leu Gin Leu Asp Val Lys 
35 40 45 

Asp Met Ser Lys Ser Leu Thr Asn Phe Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Val Tyr Phe Gly Leu Lys Pro He Val Val Leu His Gly Tvr 
65 70 -- 



75 80 



Glu Ala Val Lys Glu Ala Leu He Asp His Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly Ser Phe Pro Val Ala Glu Lys Val Asn Lys Gly Leu Gly He 
100 105 lio 

Leu Phe Ser Asn Gly Lys Arg Trp Lys Glu He Arg Arg Phe Cys Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arq 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Asn 

150 155 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser Val He Phe His Asp Arg Phe Asp Tyr Lys Asp Gin 
IflO 185 190 

Arg Phe Leu Asn Leu Met Glu Lys Phe Asn Glu Asn Leu Arg He Leu 
195 200 205 
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Ser Ser Pro Trp lie Gin Val Cya Asn Asn Phe Pro Ala Leu He Asp 
210 215 220 



lyr Leu Pro Gly Ser His Asn Lys He Ala Glu Asn Phe Ala Tyr He 
225 230 235 240 

Lys Ser Tyr Val Leu Glu Arg He Lys Glu His Gin Glu Ser Leu Asp 
245 250 255 

Met Asn Ser Ala Arg Asp Phe He Asp Cys Phe Leu He Lys Met Glu 
260 265 270 

Gin Glu Lys His Asn Gin Gin Ser Glu Phe Thr Val Glu Ser Leu He 
275 280 285 

Ala Thr Val Thr Asp Met Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Gly Leu Leu Leu Leu Leu Lys Tyr Pro Glu Val Thr 
305 310 

Ala Lys Val Gin Glu Glu He Glu Cys Val Val Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu He Gin Arg Tyr He Asp Leu Leu Pro Thr Asn Leu Pro His 
355 360 2€5 

Ala Val Thr Cys Asp Val Lys Phe Lys Asn Tyr Leu He Pro Lys Glv 
370 375 380 

Thr Thr He He Thr Ser Leu Thr Ser Val Leu His Asn Asp Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro Gly His Phe Leu Asp Lys Ser 
405 410 4i5 

Glyj Asn Phe Lys Lys Ser Asp Tyr Phe Met Pro Phe Ser Ala Gly Lvs 
? 420 42R 430 ^ 



Arg Met Cys Met Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Thr He Leu Gin Asn Phe Asn Leu Lys Ser Gin Val Asp Pro 
450 455 460 

Lys Asp He Asp He Thr Pro He Ala Asn Ala Phe Gly Arg Val Pro 

470 475 480 
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Pro Leu Tyx Gin Leu Cys Phe lie Pro Val 
485 490 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2009 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

GGCACCGGAA AGAACAAGAA AAAAGAACAC CTTATTTTTA TCTTCTTCAG TGAGCCAATG 6 0 

TTCATTCAAA AGAGAGATTA AAGTGCTTTT TGCTGACTAG TCACAGTCAG AGTCAGAATC 120 

ACAGGTGGAT TAGTAGGGAG TGTTATAAAA GCCTTGAAGT GAAAGCCCGC AGTTGTCTTA 180 

CTAAGAAGAG AAGCCTTCAA TGGATCCAGC TGTGGCTCTG GTGCTCTGTC TCTCCTGTTT 24 0 

GTTTCTCCTT TCACTCTGGA GGCAGAGCTC TGGAAGAGGG AGGCTCCCGT CTGGCCCCAC 300 

TCCTCTCCCG ATTATTGGAA ATATCCTGCA GTTAGATGTr AAGGACATGA GCAAATCCTT 360 

AACCAATTTC TCAAAAGTCT ATGGCCCTGT GTTCACTGTG TATTTTGGCC TGAAGCCCAT 420 

TGTGGTGTTG CATGGATATG AAGCAGTGAA GGAGGCCCTG ATTGATCATG GAGAGGAGIT 480 

TTCTGGAAGA GGAAGTTTTC CAGTGGCTGA AAAAGTTAAC AAAGGACTTG GAATCCTTTT 54 0 

CAGCAATGGA AAGAGATGGA AGGAGATCCG GCGTTTCTGC CTCATGACTC TGCGGAATIT 600 

TGGGATGGGG AAGAGGAGCA TCGAGGACCG TGTTCAAGAG GAAGCCCGCT GCCTTGTGGA 660 

GGAGTIGAGA AAAACCAATG CCTCACCCTG TGATCCCACT TTCATCCTGG GCTGTGCTCC 720 

CTGCAATGTG ATCTGCTCTG TTATTTTCCA TGATCGATTT GATTATAAAG ATCAGAGGTT 780 

TCTTAACTTG ATGGAAAAAT TCAATGAAAA CCTCAGGATT CTGAGCTCTC CATGGATCCA 840 

GGTCTGCAAT AATTTCCCTG CTCTCATCGA TTATCTCCCA GGAAGTCATA ATAAAATAGC 900 

TGAAAATTTT GCTTACATTA AAAGTTATGT ATTGGAGAGA ATAAAAGAAC ATCAAGAATC 960 

CCTGGACATG AACAGTGCTC GGGACTTTAT TGATTGnTC CTGATCAAAA TGGAACAGGA 1020 

AAAGCACAAT CAACAGTCTG AATTTACTGT TGAAAGCTTG ATAGCCACTG TAACTGATAT 1080 

GTTTGGGGCT GGAACAGAGA CAACGAGCAC CACTCTGAGA TATGGACTCC TGCTCCTGCT 1140 

GAAGTACCCA GAGGTCACAG CTAAAGTCCA GGAAGAGATT GAATGTGTAG TTGGCAGAAA 1200 

CCGGAGCCCC TGTATGCAGG ACAGGAGTCA CATGCCCTAC ACAGATGCTG TGGTGCACGA 1260 

GATCCAGAGA TACATTGACC TCCTCCCCAC CAACCTGCCC CATGCAGTGA CCTGTGATGT 1320 

TAAATTCAAA AACTACCTCA TCCCCAAGGG CACGACCATA ATAACATCCC TGACTTCTGT 1380 

GCTGCACAAT GACAAAGAAT TCCCCAACCC AGAGATGTIT GACCCTGGCC ACTTrCTGGA 1440 

TAAGAGTGGC AACTTTAAGA AAAGTGACTA CTTCATGCCT TTCTCAGCAG GAAAACGGAT 1500 

GTGTATGGGA GAGGGCCTGG CCCGCATGGA GCTGrmTA TTCCTGACCA CCATITrGCA 1560 
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GAACTTTAAC CTGAAATCTC AGGTTGACCC AAAGGATATT GACATCACCC CCATTGCCAA 1620 

TGCATTTGGT CGTGTGCCAC CCTTGTACCA GCTCTGCTTC ATTCCTGTCT GAAGAAGGGC 1680 

AGATAGTTTG GCTGCTCCTG TGCTGTCACC TGCAATTCTC CCTTATCAGG GCCATTAGCC 174 0 

TCTCCCTTCT CTCTGTGAGG GATATTTTCT CTGACTTGTC AATCCACATC TTCCCATTCC 1800 

CTCAAGATCC AATGAACATC CAACCTCCAT TAAAGAGAGT TTCTTGGGTC ACTTCCTAAA 1860 

TATATCTGCT AITCTCCATA CTCTGTATCA atTGTAlTGA CCACCACATA TGCTAATACC 1920 

TATCTACTGC TGAGTTGTCA GTATGTTATC ACTAGAAAAC AAAGAAAAAT GATTAATAAA 1980 

TGACAATTCA GAGCCAAAAA AAAAAAAAA 2009 



(2) INFORMATION FOR SEQ ID N0:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 490 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

Met Glu Pro Phe Val Val Leu Val Leu Cys Leu Ser Phe Met Leu Leu 
^5 10 15 

Phe Ser Leu Trp Arg Gin Ser Cys Arg Arg Arg Lys Leu Pro Pro Gly 
.20 25 30 

Pro Thr Pro Leu Pro He He Gly Asn Met Leu Gin He Asp Val Lys 
35 4o 45 

Asp He Cys Lys Ser Phe Thr Asn Phe Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Val Tyr Phe Gly Met Asn Pro He Val Val Phe His Gly Tyr 
65 70 75 80 

Glu Ala Val Lys Glu Ala Leu He Asp Asn Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly Asn Ser Pro He Ser Gin Arg He Thr Lys Gly Leu Gly He 
100 105 110 

He Ser Ser Asn Gly Lys Arg Trp Lys Glu He 'Arg Arg Phe Ser Leu 
115 120 125 

Thr Asn Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Ara 
130 135 140 

Val Gin Glu Glu Ala His Cys Leu Val Glu Glu Leu Arg Lys Thr Lys 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser Val Val Phe Gin Lys Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 
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Aan Phe Leu Thr Leu Met Lys Arg Phe Asn Glu Asn Phe Arg lie Leu 
195 200 205 

Asn Ser Pro Trp He Gin Val Cys Asn Asn Phe Pro Leu Leu He Asp 
210 215 220 



Cys Phe Pro Gly Thr His Asn Lys Val Leu Lys Asn Val Ala Leu Thr 
225 230 235 240 

Arg Ser Tyr He Arg Glu Lys Val Lys Glu His Gin Ala Ser Leu Asp 
245 250 255 

Val Asn Asn Pro Arg Asp Phe Met Asp Cys Phe Leu He Lys Met Glu 
260 265 270 

Gin Glu Lys Asp Asn Gin Lys Ser Glu Phe Asri He Glu Asn Leu Val 
275 280 285 

Gly Thr Val Ala Asp Leu Phe Val Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Gly Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Asp His Val He Gly Arg His Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu He Gin Arg Tyr Ser Asp Leu Val Pro Thr Gly Val Pro His 
355 360 365 

Ala Val Thr Thr Asp Thr Lys Phe Arg Asn Tyr Leu He Pro Lys Gly 
370 375 380 , 

Thr Thr He Met Ala Leu Leu Thr Ser Val Leu His Asp Asp Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Asn He Phe Asp Pro Gly His Phe Leu Asp Lys Asn 
405 410 415 

Gly Asn Phe Lys Lys Ser Asp Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg He Cys Ala Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Thr He Leu Gin Asn Phe Asn Leu Lys Ser Val Asp Asp Leu 
450 455 460 
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Lys Aan Leu Asn Thr Thr Ala Val Thr Lya Gly He Val Ser Leu Pro 
465 470 475 480 

Pro Ser Tyr Gin He Cys Phe He Pro Val 
485 490 



(2) INFORMATION FOR SEQ ID N0:8: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 1829 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 
AATGGAACCT TITGTGGTCC TGGTGCTGTG TCTCTCTTTT ATGCTTCTCT TTTCACTCTG 
GAGACAGAGC TGTAGGAGAA GGAAGCTCCC TCCTGGCCCC ACTCCTCTTC CTATTATrGG 
AAATATGCTA CAGATAGATG TTAAGGACAT CTGCAAATCT rrCACCAATT TCTCAAAAGT 
CTATGGTCCT GTGTICACCG TGTATTTTGG CATGAATCCC ATAGTGGTGT TTCATGGATA 
TGAGGCAGTG AAGGAAGCCC TGATTGATAA TGGAGAGGAG mTCTGGAA GAGGCAATrC 
CCCAATATCT CAAAGAATTA CTAAAGGACT TGGAATCAIT TCCAGCAATG GAAAGAGATG 
GAAGGAGATC CGGCGTTrCT CCCTCACAAA CTTGCGGAAT TTIGGGATGG GGAAGAGGAG 
CATTGAGGAC CGTGTTCAAG AGGAAGCTCA CTGCCTTGTG GAGGAGITGA GAAAAACCAA 
GGCTTCACCC TGTGATCCCA CrTTCATCCT GGGCTGTGCT CCCTGCAATG TGATCTGCTC 
CGTTGTmc CAGAAACGAT TTGATTATAA AGATCAGAAT TITCTCACCC TGATGAAAAG 
ATTCAATGAA AACTTCAGGA TTCTGAACTC CCCATGGATC CAGGTCTGCA ATAAITTCCC 
TCTACTCATT GATTGTTTCC CAGGAACTCA CAACAAAGTG CTTAAAAATG TrGCTCTTAC 
ACGAAGTTAC ATTAGGGAGA AAGTAAAAGA ACACCAAGCA TCACTGGATG TTAACAATCC 
TCGGGACTTT ATGGATTGCT TCCTGATCAA AATGGAGCAG GAAAAGGACA ACCAAAAGTC 
AGAATTCAAT ATTGAAAACT TGGTTGGCAC TGTAGCTGAT CTATITGTrG CTGGAACAGA 
GACAACAAGC ACCACTCTGA GATATGGACT CCTGCTCCTG CTGAAGCACC CAGAGGTCAC 
AGCTAAAGTC CAGGAAGAGA TTGATCATGT AATTGGCAGA CACAGGAGCC CCTGCATGCA 
GGATAGGAGC CACATGCCIT ACACTGATGC , TGTAGTGCAC GAGATCCAGA GATACAGTGA 
CCTTGTCCCC ACCGGTGTGC CCCATGCAGT GACCACTGAT ACTAAGTTCA GAAACTACCT 
CATCCCCAAG GGCACAACCA TAATGGCATT ACTGACITCC GTGCTACATG ATGACAAAGA 
ATITCCTAAT CCAAATATCT TTGACCCTGG CCACTTTCTA GATAAGAATG GCAACTITAA 
GAAAAGTGAC TACTTCATGC CTTTCTCAGC AGGAAAACGA ATTTOTGCAG GAGAAGGACT 
TGCCCGCATG GAGCTATm TATITCTAAC CACAATTTrA CAGAACTTrA ACCTGAAATC 
TGTTGATGAT TTAAAGAACC TCAATACTAC TGCAGTTACC AAAGGGATTG TTrCTCTGCC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
640 
900 
960 
1020 
1080 
114 0 
1200 
1260 
1320 
1380 
1440 
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ACCCTCATAC CAGATCTGCT TCATCCCTGT CTGAAGAATG CTAGCCCATC TGGCTGCTGA 1500 

TCTGCTATCA CCTGCAACTC nTTTTTATC AAGGACATTC CCACTATTAT GTCTTCTCTG 1560 

ACCTCTCATC AAATCTTCCC ATTCACTCAA TATCCCATAA GCATCCAAAC TCCATTAAGG 1620 

AGAGTTGTTC AGGTCACTGC ACAAATATAT CTGCAATTAT TCATACTCTG TAACACTTGT 1680 

ATTAATTGCT GCATATGCTA ATACTTTTCT AATGCTGACT TTTTAATATG TTATCACTGT 174 0 

AAAACACAGA AAAGTGATTA ATGAATGATA ATTTAGTCCA TTTCTTTTGT GAATGTGCTA 1800 

AATAAAAAGT GTTATTAATT GCTGGTTCA 1829 



(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 490 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:9: 

Met Asp Ser Leu Val Val Leu Val Leu Cys Leu Ser Cys Leu Leu Leu 
15 10 15 

Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Lys Leu Pro Pro Gly 
20 25 30 

Pro Thr Pro Leu Pro Val He Gly Asn He Leu Gin He Gly He Lys 
35 40 45 

Asp He Ser Lys Ser Leu Thr Asn Leu Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Leu Tyr Phe Gly Leu Lys Pro He Val Val Leu His Gly Tyr 
65 70 75 80 

Glu Ala Val Lys Glu Ala Leu He Asp Leu Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly He Phe Pro Leu Ala Glu Arg Ala Asn Arg Gly Phe Gly He 
100 105 110 

Val Phe Ser Asn Gly Lys Lys Trp Lys Glu He Arg Arg Phe Ser Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arq 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Lys 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser He He Phe His Lys Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 

Gin Phe Leu Asn Leu Met Glu Lys Leu Asn Glu Asn He Lys He Leu 
195 200 205 



\ 

wo 95/30766 



PCT/US95/05744 



103 



Ser Ser Pro Trp lie Gin lie Cys Asn Asn Phe Ser Pro lie lie Asp 
210 215 220 

Tyr Phe Pro Gly Thr His Asn Lys Leu Leu Lys Asn Val Ala Phe Met 
225 230 235 240 

Lys Ser Tyr He Leu Glu Lys Val Lys Glu His Gin Glu Ser Met Asp 
245 250 255 

Met Asn Asn Pro Gin Asp Phe He Asp CYs Phe Leu Met Lys Met Glu 
260 265 270 

Lys Glu Lys His Asn Gin Pro Ser Glu Phe Thr He Glu Ser Leu Glu 
275 280 285 

Asn Thr Ala Val Asp Leu Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Ala Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Glu Arg Val He Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu Val Gin Arg Tyr He Asp Leu Leu Pro Thr Ser Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp He Lys Phe Arg Asn Tyr Leu He Pro Lys Gly 
370 375 380 

Thr Thr He Leu He Ser Leu Thr Ser Val Leu His Asp Asn Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro His His Phe Leu Asp Glu Gly 
405 410 415 

Gly Asn Phe Lys Lya Ser Lys Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg He Cys Val Gly Glu Ala Leu Ala Gly Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Ser He Leu Gin Asn Phe Asn Leu Lys Ser Leu Val Asp Pro 
450 455 460 

Lys Asn Leu Asp Thr Thr Pro Val Val Asn Gly Phe Ala Ser Val Pro 
465 470 475 480 

Pro Phe Tyr Gin Leu Cys Phe He Pro Val 
485 490 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1852 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: CDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



GAAGGCTTCA 


ATGGATTCTC 


TTGTGGTCCT 


TGTGCTCTGT 


CTCTCATGTT TGCTTCTCCT 


60 


TTCACTCTGG 


AGACAGAGCT 


CTGGGAGAGG 


AAAACTCCCT 


CCTGGCCCCA CTCCTCTCCC 


120 


AGTGATTGGA 


AATATCCTAC 


AGATAGGTAT 


TAAGGACATC 


AGCAAATCCT TAACCAATCT 


180 


CTCAAAGGTC 


TATGGCCCTG 


TGTTCACTCT 


GTATTTTGGC 


CTGAAACCCA TAGTGGTGCT 


240 


GCATGGATAT 


GAAGCAGTGA 


AGGAAGCCCT 


GATTGATCTT 


GGAGAGGAGT TTTCTGGAAG 


300 


AGGCATTTTC 


CCACTGGCTG 


AAAGAGCTAA 


CAGAGGATTT 


GGAATTGTTT TCAGCAATGG 


360 


AAAGAAATGG 


AAGGAGATCC 


GGCGTTTCTC 


CCTCATGACG 


CTGCGGAATT TTGGGATGGG 


420 


GAAGAGGAGC ATTGAGGACC 


GTGTTCAAGA 


GGAAGCCCGC 


TGCCTTGTGG AGGAGTTGAG 


480 


AAAAACCAAG 


GCCTCACCCT 


GTGATCCCAC 


TTTCATCCTG 


GGCTGTGCTC CCTGCAATGT 


540 


GATCTGCTCC 


ATTATTTTCC 


ATAAACGTTT 


TGATTATAAA 


GATCAGCAAT TTCTTAACTT 


600 


AATGGAAAAG 


TTGAATGAAA 


ACATCAAGAT 


TTTGAGCAGC 


CCCTGGATCC AGATCTGCAA 


660 


TAATTTTTCT 


CCTATCATTG 


ATTACTTCCC 


GGGAACTCAC AACAAATTAC 1TAAAAACGT 


720 


TGCTTTTATG 


AAAAGTTATA 


TTTTGGAAAA AGTAAAAGAA 


CACCAAGAAT CAATGGACAT 


780 


GAACAACCCT 


CAGGACTTTA 


TTGATTGCTT 


CCTGATGAAA ATGGAGAAGG AAAAGCACAA 


840 


CCAACCATCT 


GAATTTACTA 


TTGAAAGCTT 


GGAAAACACT 


GCAGTTGACT TGTTTGGAGC 


900 


TGGGACAGAG 


ACGACAAGCA 


CAACCCTGAG ATATGCTCTC 


CTTCTCCTGC TGAAGCACCC 


960 


AGAGGTCACA 


GCTAAAGTCC 


AGGAAGAGAT 


TGAACGTGTG ATTGGCAGAA ACCGGAGCCC 


1020 


CTGCATGCAA 


GACAGGAGCC 


ACATGCCCTA 


CACAGATGCT 


GTGGTGCACG AGGTCCAGAG 


1080 


ATACATTGAC 


CTTCTCCCCA 

ii 


CCAGCCTGCC 


CCATGCAGTG ACCTGTGACA TTAAATTCAG 


1140 


AAACTATCTC ATTCCCAAGG 


GCACAACCAT ATTAA1TTCC 


CTGACTTCTG TGCTACATGA 


1200 


CAACAAAGAA 


TTTCCCAACC 


CAGAGATGTT TGACCCTCAT CACTTTCTGG ATGAAGGTGG 


1260 


CAATTTTAAG AAAAGTAAAT ACTTCATGCC TXTCTCAGCA GGAAAACGGA TTTGTGTGGG 


1320 


AGAAGCCCTG 


GCCGGCATGG 


AGCTGTTTTT 


ATTCCTGACC 


TCCATTTTAC AGAACTTTAA 


1380 


CCTGAAATCT 


CTGGTTGACC 


CAAAGAACCT 


TGACACCACT 


CCAGTTGTCA ATGGATTTGC 


1440 


CTCTGTGCCG 


CCCTTCTACC AGCTGTGCTT CATTCCTGTC TGAAGAAGAG CAGATGGCCT 


1500 


GGCTGCTGCT 


GTGCAGTCCC 


TGCAGCTCTC 


TTTCCTCTGG 


GGCATTATCC ATCTTTCACT 


1560 


ATCTGTAATG 


CCTTTTCTCA 


CCTGTCATCT 


CACATTTTCC 


CTTCCCTGAA GATCTAGTGA 


1620 


ACATTCGACC TCCATTACGG AGAGTTTCCT ATGTTTCACT GTGCAAATAT ATCTGCTATT 


1680 


CTCCATACTC 


TGTAACAGTT 


GCATTGACTG 


TCACATAATG 


CTCATACTTA TCTAATGTTG 


1740 


AGTTATTAAT ATGTTATTAT 


TAAATAGAGA AATATGATTT GTGTATTATA ATTCAAAGGC 


1800 


ATrTCTTTTC 


TGCATGTTCT 


AAATAAAAAG 


CATTATTATT 


TGCTGAAAAA AA 


1852 
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(2) INFORMATION FOR SEQ ID N0:11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 490 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Asp Pro Ala Val Ala Leu Val Leu Cys Leu Ser Cys Leu Phe Leu 
15 10 15 

Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Arg Leu Pro Ser Gly 
20 25 30 

Pro Thr Pro Leu Pro He He Gly Asn He Leu Gin Leu Asp Val Lys 
35 40 45 

Asp Met Ser Lys Ser Leu Thr Asn Phe Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Val Tyr Phe Gly Leu Lys Pro He Val Val Leu His Gly Tyr 
65 70 75 80 



Glu Ala Val Lys Glu Ala Leu He Asp His Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly Ser Phe Pro Val Ala Glu Lys Val Asn Lys Gly Leu Gly He 
100 105 110 

Leu Phe Ser Asn Gly Lys Arg Trp Lys Glu He Arg Arg Phe Cys Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arq 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Asn 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser Val He Phe His Asp Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 

Arg Phe Leu Asn Leu Met Glu Lys Phe Asn Glu Asn Leu Arg He Leu 
195 200 205 



Ser Ser Pro Trp He Gin Val Cys Asn Asn Phe Pro Ala Leu He Asp 
210 215 220 

Tyr Leu Pro Gly Ser His Asn Lys He Ala Glu Asn Phe Ala Tyr He 
225 230 235 240 

Lys Ser Tyr Val Leu Glu Arg He Lys Glu His Gin Glu Ser Leu Asp 
245 250 255 

Met Asn Ser Ala Arg Asp Phe He Asp Cys Phe Leu He Lys Met Glu 
260 265 270 

Gin Glu Lys His Asn Gin Gin Ser Glu Phe Thr Val Glu Ser Leu He 
275 280 285 
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Ala Thr Val Thr Asp Met Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Gly Leu Leu Leu Leu Leu Lys Tyr Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Glu Cys Val Val Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu He Gin Arg Tyr He Asp Leu Leu Pro Thr Asn Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp Val Lys Phe Lys Asn Tyr Leu He Pro Lys Gly 
370 375 380 

Met Thr He He Thr Ser Leu Thr Ser Val Leu His Asn Asp Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro Gly His Phe Leu Asp Lys Ser 
405 410 415 

Gly Asn Phe Lys Lys Ser Asp Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg Met Cys Met Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Thr He Leu Gin Asn Phe Asn Leu Lys Ser Gin Val Asp Pro 
450 455 . 460 

Lys Asp He Asp He Thr Pro He Ala Asn Ala Phe Gly Arg Val Pro 
465 470 475 480 

Pro Leu Tyr Gin Leu Cys Phe He Pro Val 
485 490 

ii 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2258 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:12: 

AGTGAAAGCC CGCAGTTGTC TTACTAAGAA GAGAAGCCTI CAATGGATCC AGCTGTGGCT 60 

CTGGTGCTCT GTCTCTCCTG TTTGTTTCTC CTTTCACTCT GGAGGCAGAG CTCTGGAAGA 120 

GGGAGGCTCC CGTCTGGCCC CACTCCTCTC CCGATTATTG GAAATATCCT GCAGTTAGAT 180 

GTTAAGGACA TGAGCAAATC CTTAACCAAT TTCTCAAAAG TCTATGGCCC TGTGTTCACT 24 0 

GTGTATTTTG GCCTGAAGCC CATTGTGGTG TTGCATGGAT ATGAAGCAGT GAAGGAGGCC 300 

CTGATTGATC ATGGAGAGGA GTrXTCTGOA AGAGGAAGTT TTCCAGTGGC TGAAAAAGTT 360 

AACAAAGGAC TTGGAATCCT TTTCAGCAAT GGAAAGAGAT GGAAGGAGAT CCGGCGTTTC 420 

TGCCTCATGA CTCTGCGGAA TTTTGGGATG GGGAAGAGGA GCATCGAGGA CCGTGTTCAA 480 
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GAGGAAGCCC 


GCTGCCTTGT GGAGGAGTTG 


AGAAAAACCA ATGCCTCACC 


CTGTGATCCC 


540 


ACTTTCATCC 


TGGGCTGTGC TCCCTGCAAT 


GTGATCTGCT CTGTTATTTT 


CCATGATCGA 


600 


TTTGATTATA 


AAGATCAGAG GITTCTTAAC 


TTGATGGAAA AATTCAATGA 


AAACCTCAGG 


660 


ATTCTGAGCT 


CTCCATGGAT CCAGGTCTGC 


AATAATTTCC CTGCTCTCAT 


CGATTATCTC 


720 


CCAGGAAGTC 


ATAATAAAAT AGCTGAAAAT 


TTTGCTTACA TTAAAAGTTA TGTATTGCSikG 


780 


AGAATAAAAG 


AACATCAAGA ATCCCTGGAC 


ATGAACAGTG CTCGGGACTT 


TATTGATTGT 


840 


TTCCTGATCA 


AAATGGAACA GGAAAAGCAC 


AATCAACAGT CTGAATTTAC 


TGTTGAAAGC 


900 


TTGATAGCCA 


CTGTAACTGA TATGTTTGGG 


GCTGGAACAG AGACAACGAG 


CACCACTCTG 


960 


AGATATGGAC 


TCCTGCTCCT GCTGAAGTAC 


CCAGAGGTCA CAGCTAAAGT 


CCAGGAAGAG 


1020 


ATTGAATGTG 


TAGTTGGCAG AAACCGGAGC 


CCCTGTATGC AGGACAGGAG 


TCACATGCCC 


1080 


TACACAGATG 


CTGTGGTGCA CGAGATCCAG 


AGATACATTG ACCTCCTCCC 


CACCAACCTG 


1140 


CCCCATGCAG 


TGACCTGTGA TGTTAAATTC 


AAAAACTACC TCATCCCCAA 


GGGCATGACC 


1200 


ATAATAACAT 


CCCTGACTTC TGTGCTGCAC 


AATGACAAAG AATTCCCCAA 


CCCAGAGATG 


1260 


TTTGACCCTG 


GCCACTTTCT GGATAAGAGT 


GGCAACTTTA AGAAAAGTGA 


CTACTTCATG 


1320 


CCTTTCTCAG 


CAGGAAAACG GATGTGTATG 


GGAGAGGGCC TGGCCCGCAT 


GGAGCTGTTT 


1380 


TTATTCCTGA 


CCACCATTTT GCAGAACTTT 


AACCTGAAAT CTCAGGTTGA 


CCCAAAGGAT 


1440 


ATTGACATCA 


CCCCCATTGC CAATGCATTT 


GGTCGTGTGC CACCCTTGTA 


CCAGGTCTGC 


1500 


TTCATTCCTG 


TCTGAAGAAG GGCAGATAGT 


TTGGCTGCTC CTGTGCTGTC 


ACCTGCAATT 


1560 


CTCCCTTATC 


AGGGCCATTG GCCTCTCCCT 


TCTCTCTATG AGGGATATTT 


TCTCTGACTT 


1620 


GTCAATCCAC 


ATCTTCCCAT TCCCTCAAGA 


TCCAATGAAC ATCCAACCTC 


CATTAAAGAG 


1680 


AGTTTCTTGG 


GTCACTTCCT AAATATATCT 


GCTATTCTCC ATACTCTGTA 


TCACTTGTAT 


1740 


TGACCACCAC 


ATATGCTAAT ACCTATCTAC 


TGCTGAGTTG TCAGTATGTT ATCACTATAA 


1800 


AACAAAGAAA 


AATGATTAAT AAATGACAAT 


TCAGAGCCAT rTATTCTCTG 


CATGCTCTAG 


1860 


ATAAAAATGA 


TTATTATTTA CTGGGTCAGT 


TCTTAGATTT CTTTCmTG AGTAAAATGA 




AAGTAAGAAA 


TGAAAGAAAA TAGAATGTGA 


AGAGGCTGTG CTGGCCCTCA TAGTGTTAAG 


1980 


CACAAAAAGG 


GAGAAAGGTA AGAGGGTAGG 


AAAGCTGTTT TAGCTAAATG 


CCACCTAGAG 


2040 


TTATTGGAGG 


TCTGAATTTG GAAAAAAAAA 


CTATGTCCAG GAGCAGCTGT AACCTGTAGG 


2100 


GAAATAATGG 


AACAATCATC CATAAGAGGG 


ATGAACATTA AGTGTTTGAA TTCATGCTCT 


2160 


GCTTTTGTGT 


TACTGTAAAC ACAAGATCAA 


GATTTGGATA ATCTTTTTCC 


TTTGTGTTTC 


2220 


CAACTTAGAT 


CATGTCTAAA TATATGCTTT 


CATATGGC 




2258 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Asp Pro Xaa Val Val Leu Val Leu Cys Leu Ser Cys Leu Leu Leu 
1 5 10 ■ 15 

Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Ly3 Leu Pro Pro Gly 
20 25 30 

Pro Thr Pro Leu Pro Xaa He Gly Asn He Leu Gin He Asp Xaa Lys 
35 40 45 

Asp He Ser Lys Ser Leu Thr Asn Xaa Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Xaa Tyr Phe Gly Leu Lys Pro He Val Val Leu His Gly Tyr 
65 70 75 80 

Glu Ala Val Lys Glu Ala Leu He Asp Leu Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly Xaa Phe Pro Leu Ala Glu Arg Ala Asn Xaa Gly Xaa Gly He 
100 105 110 

Val Phe Ser Asn Gly Lys Arg Trp Lys Glu He Arg Arg Phe Ser Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Aiirg 
130 135 140 

Val Gin Glu Glu Ala Arg Cya Leu Val Glu Glu Leu Arg Lys Thr Lys 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser Xaa He Phe His Lys Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 

Gin Phe Leu Asn Leu Met Glu Lys Xaa Asn Glu Asn He Arg He Leu 
195 200 205 

Ser Ser Pro Trp He Gin Xaa Cys Asn Asn Phe Pro Xaa Xaa He Asp 
210 215 220 

Tyr Phe Pro Gly Thr His Asn Lys Leu Leu Lys Asn Val Ala Phe Met 
225 230 235 240 

Lys Ser Tyr He Leu Glu Lys Val Lys Glu His Gin Glu Ser Xaa Asp 
245 250 255 

Met Asn Asn Pro Arg Asp Phe He Asp Cys Phe Leu He Lys Met Glu 
260 265 270 

Xaa Glu Lys His Asn Gin Gin Ser Glu Phe Thr He Glu Ser Leu Xaa 
275 280 285 



wo 95/30766 



1 



PCTAJS95/05744 



i09 

Xaa Thr Xaa Xaa Asp Leu Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Xaa Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Glu Arg Val He Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 

His Glu Xaa Gin Arg Tyr He Asp Leu Leu Pro Thr Ser Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp Val Lys Phe Arg Asn Tyr Leu He Pro Lys Gly 
370 375 380 

Thr Thr He Leu Thr Ser Leu Thr Ser Val Leu His Asp Xaa Lys Glu 

390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro Gly His Phe Leu Asp Xaa Gly 
405 410 415 

Gly Asn Phe Lys Lys Ser Asp Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg He Cys Val Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Thr He Leu Gin Asn Phe Asn Leu Lys Ser Leu Val Asp Pro 
450 455 460 

Lys Xaa Leu Asp Thr Thr Pro Val Val Asn Gly Phe Ala Ser Val Pro 

470 475 

Pro Phe Tyr Gin Leu Cys Phe He Pro Val 
485 490 

(2) INFORMATION F(pR SEQ ID N0:14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1892 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: YES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

AGTGAAAGCC CGCAGITGTC TTACTAAGAA GAGAAGNCTT CAATGGATCC TNTTGTGGTC 60 

CTNGTGCTCT GTCTCTCATG TTTGCTTCTC CTTTCACTCT GGAGACAGAG CTCTGGGAGA 120 

GGNAANCTCC CTCCTGGCCC CACTCCTCTC CCANTNATTG GAAATATCCT ACAGATAGAT 180 

NTTAAGGACA TCAGCAAATC CTTAACCAAT NTCTCAAAAG TCTATGGCCC TGTGTrCACT 240 

NTGTATTrrG GCCTGAAACC CATAGTGGTG NTGCATGGAT ATGAAGCAGT GAAGGAAGCC 300 

CTGATTGATC NTGGAGAGGA GmTCTGGA AGAGGCANTT TCCCACTGGC TGAAAGAGNT 360 

AACANAGGAN TTGCAATCGT TTTCAGCAAT GGAAAGAGAT GGAAGGAGAT CCGGCGTTTC 420 
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TCCCTCATGA CGCTGCGGAA TTTTGGGATG GGGAAGAGGA GCATTGAGGA CCGTGTTCAA 480 

GAGGAAGCCC GCTGCCTTGT GGAGGAGTTG AGAAAAACCA AGGCCTCACC CTGTGATCCC 54 0 

ACTTTCATCC TGGGCTGTGC TCCCTGCAAT GTGATCTGCT CCNTTATTTT CCATAAACGN 600^ 

TTTGATTATA AAGATCAGNA ATTTCTTAAC TTGATGGAAA AATTNAATGA AAACATCAGG 660 

ATTCTGAGCN CCCCNTGGAT CCAGNTCTGC AATAATTTNC CTCCTNTCAT TGATTATTTC 72 0 

CCNGGAACTC ANAACAAATT ACTTAAAAAN GTTGCTnTA TGAAAAGITA TATTTTGGAG 78 0 

AAAGTAAAAG AACACCAAGA ATCANTGGAC ATGAACAANC CTCGGGACTT TATTGATTGC 84 0 

TTCCTGATCA AAATGGAGNA GGAAAAGCAC AACCAACAGT CTGAATTTAC TATTGAAAGC 900 

TTGGTANNCA CTGNAGCTGA NTTGTTTGGA GCTGGNACAG AGACAACAAG CACNACNCTG 960 

AGATATGNNC TCCTNCTCCT GCTGAAGCAC CCAGAGGTCA CAGCTAAAGT CCAGGAAGAG 102 0 

ATTGAACGTG TAATTGGCAG AAACCGGAGC CCCTGCATGC AGGACAGGAG CCACATGCCC 1080 

TACACAGATG CTGTGGTGCA CGAGNTCCAG AGATACATTG ACCTNCTCCC CACCAGCCTG 114 0 

CCCCATGCAG TGACCTGTGA NNTTAAATTC AGAAACTACC TCATOCCCAA GGGCACAACC 1200 

ATANTAACNT CCCTGACTTC TGTGCTACAT GANNACAAAG AATITCCCAA CCCAGAGATG 1260 

TTTGACCCTN GNCACTTTCT GGATNANNGT GGCAANTTTA AGAAAAGTNA CTACTTCATG 1320 

CCTTTCTCAG CAGGAAAACG GATTTGTGTG GGAGANGGCC TGGCCCGCAT GGAGCTGTTT 1380 

TTATTCCTGA CCNCCATTTT ACAGAACTTT AACCTGAAAT CTCTGGTTGA CCCAAANGAC 1440 

CTTGACACCA CTCCAGTTGN CAATGGA1TT GCTTCTGTGC CNCCCTTCTA CCAGCTNTGC ^500 

TTCATTCCTG TCTGAAGAAG GGCAGATGGT CTGGCTGCTN CTGTGCTGTC NCNNNNNNTN 1560 

NNTTTNNTCT GGGGCAATTT CCNTCTTNCA TNNNTNTTNN TGCNNTTTNT CATCTGNCAT 1620 

CTCACANTNC NNCTTCCCTT ANCATCNAGN NACCATTNAN NNNCAATNTC CAAGAGNGTG 1680 

NNTTTNTTNN CTNTCCACCT ANATCTATCN NTNNNNCTNC TNTNTNTNNA TNACTTTGAT 1740 

TGTCCNCTAN TGATGNTAAT TNTTTAATAT TGNNTTATTG NNANNNTNTT ATNANTNANA 1800 

AANAAATGAT AATTNTNTNN AAATNNNAAG TCANTGCNNT TNANNATNTN CNNAATAAAA 1860 

AGCATTATTA TTTGCTGAAA AAAAGTCAGT TC 1892 

(2) INFORMATION FOR SEQ ID NO; 15: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

. (ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GCAAGCTTAA AAAATGGATC CAGCTGTGGC TCT 



33 
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(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GCAAGCTTGC CAAACTATCT GCCCTTCT 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
ACTTTTCAAT GTAAGCAAAT 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
TTAGTAATTC TTTGAGATAT 



(2) INFORMATION FOR SEQ ID NO : 19 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:19: 
CTGTTAGCTC TTTCAGCCAG 
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(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
GGAGCACAGC CCAGGATGAA 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GCAAGCTTAA AAAATGGATC CAGCTGTGGC TCT 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
GCAAGCTTGC CAAACTATCT GCCCTTCT 



(2) INFORMATION FOR SEQ ID N0:23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:23: 
TGGCCCTGAT AAGGGAGAAT 
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(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 
ATCCAGAGAT ACATTGACCT C 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 
CCATGAAGTG ACCTGTGATG 



(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:26 
AAAGATGGAT AATGCCCCAG 



(2) INFORMATION FOR SEQ ID N0:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
GAAGGAGATC CGGCGTTTCT 
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(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
{D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
GGCGTTTCTC CCTCATGACG 



(2) INFORMATION FOR SEQ ID NO:29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 
TTGTCATTGT GCAG 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 
,i (B) TYPE: nucleic acid 
I (C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
CACATGCCCT ACACA 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
TGACGCTGCG GAATT 
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f2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQaENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
GGACTTTATT GATTG 



(2} INFORMATION FOR SEQ ID NO: 33; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
ATGATTCTCT TGTGGTCCT 



(2) INFORMATION FOR SEQ ID NO:34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
AAAGATGGAT AATGCCCCCA G 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 
GCAAGCTTAA AAAAATGGAA CCTTITGTGG TCCT 



PCTAJS95/05744 



wo 95/30766 



PCTAJS95/05744 



116 

(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 
GCAAGCTTGC CAGATGGGCT AGCATTCT 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: Linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:37: 
GCAAGCTTAA AAAAATGGAT TCTCTTGTGG TCCT 
(2) INFORMATION FOR SEQ ID N0:38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear| 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 
GCAAGCTTGC CAGGCCATCT GCTCTTCT 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: 
GCAAGCTTAA AAAAATGGAT TCTCTTGTGG TCCT 
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C2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
GCAAGCTTGC CAGACCATCT GTGCTTCT 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligo) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
AGCTTAAAAA AATG 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 
(DJ TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligo) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:42: 
GATCCATTTT TTTA 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 

Cys lie Asp Tyx Leu Pro Gly Ser His Asn Lys He Ala Glu Asn Phe 
15 10 15 

Ala 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

Cys Leu Ala Phe Met Glu Ser Asp He Leu Glu Lys Val Lys 
1 5 10 



(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 284 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 2. .283 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 



A TTG AAT GAA AAC ATC AGG AW GTA AGC ACC CCC TGG ATC CAG ATA 46 
Leu Asn Glu Asn He Arg He Val Ser Thr Pro Trp He Gin He 
15 10 15 

TGC AAT AAT TTT CCC ACT ATC ATT GAT TAT TTC CCG GGA ACC CAT AAC 94 

Cys Asn Asn Phe Pro Thr He He Asp Tyr Phe Pro Gly Thr His Asn 
20 25 30 

AAA TTA CTT AAA AAC CTT GCT TTT ATG GAA AGT GAT ATT TTG GAG AAA 142 
Lys Leu Leu Lys Asn Leu Ala Phe Met Glu Ser Asp He Leu Glu Lvs 
35 40 45 

GTA AAA GAA CAC CAA GAA TCG ATG GAC ATC AAC AAC CCT CGG GAC TTT 190 
Val Lys Glu His Gin Glu Ser Met Asp He Asn Asn Pro Arg Asp Phe 
50 55 60 

ATT GAT TGC TTC CTG ATC AAA ATG GAG AAG GAA AAG CAA AAC CAA CAG 238 
He Asp Cys Phe Leu He Lys Met Glu Lys Glu Lys Gin Asn Gin Gin 
65 70 75 

TCT GAA TTC ACT ATT GAA AAC TTG GTA ATC ACT GCA GCT GAC TTA 283 
Ser Glu Phe Thr He Glu Asn Leu Val He Thr Ala Ala Asp Leu 
80 85 90 

^ 284 
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""1:2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 94 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:46: 

Leu Asn Glu Asn lie Arg He Val Ser Thr Pro Trp He Gin He Cys 
1 5 10 15 

Aan Asn Phe Pro Thr He He Asp Tyr Phe Pro Gly Thr His Asn Lys 
20 25 30 

Leu Leu Lys Asn Leu Ala Phe Met Glu Ser Asp He Leu Glu Lys Val 
35 40 45 

Lys Glu His Gin Glu Ser Met Asp He Asn Asn Pro Arg Asp Phe He 
50 55 60 

Asp Cys Phe Leu He Lys Met Glu Lys Glu Lys Gin Asn Gin Gin Ser 
65 70 75 80 

Glu Phe Thr He Glu Asn Leu Val He Thr Ala Ala Asp Leu 
85 90 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 244 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) jjMOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 44. .103 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 

ATTGAATGAA AACATCAGGA TTGTAAGCAC CCCCTGGATC CAG GAA CCC ATA ACA 55 

Glu Pro He Thr 
1 

AAT TAC TTA AAA ACC TTG CTT TTA TGG AAA GTG ATA TTT TGG AGA AAG 103 
Asn Tyr Leu Lys Thr Leu Leu Leu Trp Lys Val He Phe Trp Arg Lys 
5 10 15 20 

TAAAAGAACA CCAAGAATCG ATGGACATCA ACAACCCTCG GGACTTTATT GATTGCTTCC 163 

TGATCAAAAT GGAGAAGGAA AAGCAAAACC AACAGTCTGA ATTCACTATT GAAAACTTGG 223 

TAATCACTGC AGCTGACTTA C 244 
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(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQaENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

Glu Pro He Thr Asn Tyr Leu Lys Thr Leu Leu Leu Trp Lys Val He 
15 10 15 

Phe Trp Arg Lys 
20 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 83 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1..32 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 33 . .83 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
TTTTAATfTA ATAAATTATT GTTTTCTCTT AGATATGCAA TAATTTTCCC ACTATCATTG 
ATTATTTCCC GGGAACCCAT AAC 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 83 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1..72 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 73, .83 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
TnTAArrTA ATAAATTATT GTTTTCTCTT AGATATGCAA TAATTTTCCC ACTATCATTG 
ATTATTTCCA AGGAACCCAT AAC 
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(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 826 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
ATGGTGATGT AGNAANTCAT NCCATCTTAT ATTTCNAGAG TGTAGAGGAG GATTGrTGNG 
GAAGTAAGAG GNNTAAGATA GAGATGCNTT TATACTATCC CAAGCAGGGA TRAGTCTAGG 
AAATGATTAT CGTCTTTGAT TCTCTTGTCA GRATTTTCTT TCTCMNATCT TGTATAATCA 
GAGAATTACT ACACATGGAC AATRAARATT TCCCCNTCCA GATANACAAT ATATnTATT 
TATATTTATA GTrTTAAATT ACAACCAGAG CTTGGCATAT TGTATCTATA CCTTTAATAA 300 
ATGCTTTTAA TTTAATAAAT TATrGTrTTC TCTTAGATAT GCAATAATTT TCCCACTATC 360 
ATrGATTATT TCCCGGGAAC CCATAACAAA TTACTTAAAA ACCTTGCnT TATGGAAAGT 
GATATirrGG AGAAAGTAAA AGAACACCAA GAATCGATGG ACATCAACAA CCCTCGGGAC 
TTTATTGATT GCTTCCTGAT CAAAATGGAG AAGGTAAAAT GTTAACAAAA GCTrAGTTAT 
GTGACTGCTT GCGTATIdGT GATTCATTGA CTAGTTGKGT GTITACTACG GATGTrTAAC 
AGGTCAAGGA GTAATGCTTG AGAAGCATAT TTAAGTTTTT ATTGTATGCA TGAATATCCA 
GTAAGCATCA TAGAAAATGT AAAATTAANT TGTTAAATAA TTAGAATACA TAGAAGAAAT 
TGTTTAGATA AATATNATCT ATCTGAACAA TJ|AGGATGTC AGGATAGGAA AAGCTCTGTT 
TCTGCAGCTT CCAGTGGAGA TCAGCACAGG AGGGAACTTA nTTTT 



(2) INFORMATION FOR SEQ ID NO:52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 655 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 263. .421 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52; 
AGGGAAAAGA CAAATAGGCC GGGGATGNAA ATTTAGCATG TGAGCAACCT TANTTAACCA 
GCTAGGCTGT AATTGNTAAT TCGAGANTAA TGTNAAAGTG ATGTGrtGAT TTTATGCATG 
CCNNACTCNT TTTTGCTTTT AAGGGGAGTC ATAGGTAAGA TATTACTTAA AATITCTAAA 



60^ 
120 
180 
240 



420 
480 
540 
600 
660 
720 
780 
826 



60 
120 
180 
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CTATTATTAT CTGTTAACTA ATATGAAGTG TTTTATATCT AATGTTTACT CATATTTTAA 24 0 

AATTGTTTCC AATCATTTAG CT TCA CCC TGT GAT CCC ACT TTC ATC CTG GGC 292 

Ser Pro Cys Asp Pro Thr Phe He Leu Gly 
15 10 

TGT GCT CCC TGC AAT GTG ATC TGC TCC ATT ATT TTC CAG AAA CGT TTC 340 
Cys Ala Pro Cys Asn Val He Cys Ser He He Phe Gin Ly3 Arg Phe 
15 20 25 

GAT TAT AAA GAT CAG CAA TTT CTT AAC TTG ATG GAA AAA TTG AAT GAA 388 
Asp Tyr Lys Asp Gin Gin Phe Leu Asn Leu Met Glu Lys Leu Asn Glu 
30 35 40 

AAC ATC AGG ATT GTA AGC ACC CCC TGG ATC CAG GTAAGGACA AGTTTTGTGC 44 0 

Asn He Arg He Val Ser Thr Pro Trp He Gin 
45 50 

TTCCTGAGAA ACCACTTACA GTCTTTTTTT CTGGGAAATC CAAAATTCTA TATTGACCAA " 500 

GCCCTGAAGT ACATTTGTGA ATACTACAGT CTTGCCTAGA CAGCCATGGG GTGAATATCT 560 
GGAAAAGATG GCAAAGNTCT TTATTTTATG CACAGGAAAT GAATATCCCA ATATAGATCA 620 
GGCTTCTAAG CCCATTAGCT CCCTGATCAG TGTTT 655 

(2) INFORMATION FOR SEQ ID NO:53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 53 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn Val 
^5 10 15 

He Cys Ser He He Phe Gin Lys Arg Phe Asp Tyr Lys Asp Gin Gin 
20 25 30 

Phe Leu Asn Leu Met Glu Lys Leu Asn Glu Asn He Arg He Val Ser 
35 40 45 

Thr Pro Trp He Gin 
50 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 292 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
ATGAAGTGTT TTATATCTAA TGTTTACTCA TATTTTAAAA TTGTTTCCAA TCATTTAGCT 60 
TCACCCTGTG ATCCCACTTT CATCCTGGGC TGTGCTCCCT GCAATGTGAT CTGCTCCATT 120 



r 
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ATTTTCCAGA AACGTTTCGA TTATAAAGAT CAGCAATTTC TTAACTTGAT GGAAAAATTG 
AATGAAAACA TCAGGATTGT AAGCACCCCC TGAATCCAGG TAAGGACAAG TTTTGTGCTT 
CCTGAGAAAC CACTTACAGT CTrTTTTTCT GGGAAATCCA AAATTCTATA TT 



(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:55:- 
AATTACAACC AGAGCTTGGC 



(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56 
TATCACTTTC CATAAAAGCA AG 



(2) INFORMATION FOR SEQ ID NO; 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
TATTATCTGT TAACTAACTA ATATGA 



(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



22 
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{xi) SEQUENCE DESCRIPTION: SEQ ID N0:58: 
ACTTCAGGGC TTGGTCAATA 20 

(2) INFORMATION FOR SEQ ID NO : 59 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
ATTGAATGAA AACATCAGGA TTG 23 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
GTAAGTCAGC TGCAGTGATT A 21 

(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 826 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

ATGGTGATGT AGNAANTC!AT NCCATCTTAT ATTTCNAGAG TGTAGAGGAG GATTGITGNG 60 

GAAGTAAGAG GNNTAAGATA GAGATGCNTT TATACTATCC CAAGCAGGGA TRAGTCTAGG 120 

AAATGATTAT CGTCTTTGAT TCTCTTGTCA GRATITTCTT TCTCMNATCT TGTATAATCA 180 

GAGAATTACT ACACATGGAC AATRAARATT TCCCCNTCCA GATANACAAT ATATTTTATT 240 

TATATTTATA GTITTAAATT ACAACCAGAG CTTGGCATAT TGTATCTATA CCTTTAATAA 300 

ATGCTTTTAA TTTAATAAAT TATTGrTTTC TCTTAGATAT GCAATAATTT TCCCACTATC 360 

ATTGATTATT TCCCAGGAAC CCATAACT^ TTACTTAAAA ACCTTGCTTT TATGGAAAGT 420 

GATATTTTGG AGAAAGTAAA AGAACACCAA GAATCGATGG ACATCAACAA CCCTCGGGAC 4 80 
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TTTATTGATT GCTTCCTGAT CAAAATGGAG 


AAGGTAAAAT GTTAACAAAA 


GCTTAGTTAT 


540 


GTGACTGCTT GCGTATKTGT GATTCATTGA 


CTAGTTGKGT GTTTACTACG 


GATGTTTAAC 


600 


AGGTCAAGGA GTAATGCTTG AGAAGCATAT 


TTAAGTTTTT ATTGTATGCA 


TGAATATCCA 


660 


GTAAGCATCA TAGAAAATGT AAAATTAANT 


TGTTAAATAA TTAGAATACA 


TAGAAGAAAT 


720 


TGTTTAGATA AATATNATCT ATCTGAACAA 


TAAGGATGTC AGGATAGGAA 


AAGCTCTGTT 


780 


TCTGCAGCTT CCAGTGGAGA TCAGCACAGG AGGGAACTTA TTTrrr 




826 
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WHAT IS CLAIMED IS: 

1 1- A purified cytochrome P450 2C19 polypeptide 

2 comprising an amino acid sequence having at least 97% sequence 

3 identity with the amino acid sequence designated SEQ. ID. 

4 No. 1. 

1 2. A purified DNA segment encoding the purified 

2 polypeptide of claim 1. 

1 3. A stable cell line comprising an exogenous DNA 

2 segment encoding a cytochrome P450 2G19 polypeptide of 

3 claim 1, the DNA segment capable of being expressed in the 

4 cell line. 

1 4. A method of screening for a drug that is 

2 metabolized by S-mephenytoin 4' hydroxylase activity, the 

3 method comprising the steps of: 

4 contacting the drug with a cytochrome P450 2C19 

5 polypeptide of claim 1; and 

6 detecting a metabolic product resulting from an 

7 interaction between the drug and the polypeptide, the presence 

8 of the| product indicating the drug is metabolized by the s- 

9 mephenytoin 4 • -hydroxylase activity. 

1 5. A method of diagnosing a patient having a 

2 deficiency in S-mephenytoin 4 • -hydroxylase activity, the 

3 method comprising: 

4 obtaining a sample of nucleic acids from the 

5 patient; and 

6 analyzing a cytochrome P450 2C19 DNA sequence 

7 from the nucleic acids in the sample for the presence of a 

8 polymorphism indicative of the deficiency, 

1 6. The method of claim 5, further comprising the 

2 step of amplifying the cytochrome P450 2C19 DNA sequence. 

1 7. The method of claim 6, wherein the P450 2C19 

2 DNA sequence is genomic. 
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8 . The method of claim 7 , wherein the amplifying 
step is primed from a forward primer sufficiently 
complementary with a first subsequence of the antisense strand 
of the 2C19 sequence to hybridize therewith, and a reverse 
primer sufficiently complementary to a second subsequence of 
the sense strand of the 2C19 sequence to hybridize therewith. 

9. The method of claim 8, wherein the polymorphism 
occurs at nucleotide 681 of the coding region of the P450 2C19 
DNA genomic sequence. 

10. The method of claim 9, wherein the first 
subsequence of the sense strand is upstream from nucleotide 
681 of the coding region, and the second subsequence of the 
antisense strand is downstream of nucleotide 681 of the coding 
region. 

11. The method of claim 10, wherein the analyzing 
step comprises digesting the amplified DNA segment with a 
restriction enzyme that recognizes a site including nucleotide 
681 of the coding region. 

12. The method of claim 8, wherein the polymorphism 
occurs at nucleotide 636 of the coding region of the P450 2C19 
DNA genomic sequence. 

13. The method of claim 12, wherein the first 
subsequence of the sense strand is upstream from nucleotide 
636 of the coding region, and the second subsequence of the 
antisense strand is downstream of nucleotide 636 of the coding 
region. 

14. The method of claim 13, wherein the analyzing 
step comprises digesting the amplified DNA segment with a 
restriction enzyme that recognizes a site including nucleotide 
636 Of the coding region. 
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15- The method of claim 8, wherein the polymorphism 
occurs at nucleotide 636 or 681 of the coding region of the 
P450 2C19 DNA genomic sequence, wherein the first subsequence 
of the sense strand is upstream from nucleotide 636 of the 
coding region, and the second subsequence of the antisense 
strand is downstream of nucleotide 681 of the coding region. 

16. The method of claim 9, wherein the forward 

primer has 

about 10-50 contiguous nucleotides from the 
wildtype 2C19 sequence (SEQ. ID. No. 51) shown in Fig. 16 
including the nucleotide at position 681 of the coding region; 

wherein the forward primer primes amplification 
from the complement of the wildtype 2C19 sequence without 
priming amplification from the complement of the mutant 2C19 
sequence shown in Fig. 16 (SEQ. ID. No. 61). 

17. The method of claim 16, wherein the 3' 
nucleotide of the forward primer is the nucleotide at position 
681. 

18. The method 'Of claim 9, wherein the reverse 

primer has 

about 10-50 contiguous nucleotides from the 
complement of the wildtype 2C19 sequence (SEQ. ID. No. 51) 
shown in Fig. 16 including the complement to nucleotide 681 of 
the coding region; 

wherein the reverse primer primes amplification 
from the wildtype 2C19 sequence without priming amplification 
from the mutant 2C19 sequence (SEQ. ID. No. 61) shown in 
Fig. 16. 

19. The method of claim 18, wherein the 3' 
nucleotide of the reverse primer is the complement of the 
nucleotide at position 681. 
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1 20. The method of claim 9, wherein the forward 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 mutant 2C19 sequence shown in Fig. 16 including the nucleotide 

5 at position 681 of the coding sequence, 

6 wherein the forward primer primes amplification 

7 from the complement of the mutant 2C19 sequence (SEQ. ID. 

8 No. 61) without priming amplification from the complement of 

9 the wildtype 2C19 (SEQ. ID. No. 51) sequence shown in Fig. 16. 

1 21. The method of claim 20, wherein the 3' 

2 nucleotide of the forward primer is the nucleotide at 

3 position 681. 

1 22. The method of claim 9^ wherein the reverse 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 complement of the mutant 2C19 sequence (SEQ. ID. No. 61) shown 

5 in Fig. 16 including the complement to nucleotide 681 of the 

6 coding region; 

7 wherein the reverse primer primes amplification 

8 from the mutant 2C19 sequence without priming amplification 

9 from the wildtype 2C19 (SEQ. ID, No. 51) sequence shown in 
10 Fig. 16. 

1 23. The method of claim 22, wherein the 3' 

2 nucleotide of the reverse primer is the complement of the 

3 nucleotide at position 681. 

1 24- The method of claim 12, wherein the forward 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 wildtype 2C19 sequence (SEQ. ID. No. 52) shown in Fig. 17 

5 including the nucleotide at position 636 of the coding region; 

6 wherein the forward primer primes amplification 

7 from the complement of the wildtype 2C19 sequence (SEQ. ID. 

8 No. 54) without priming amplification from the complement of 

9 the mutant 2C19 sequence shown in Fig. 17. 
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1 25. The method of claim 12, wherein the reverse 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 complement of the wildtype 2C19 sequence (SEQ. ID. No. 52) 

5 shown in Fig. 17 including the complement to nucleotide 636 of 

6 the coding region; 

7 wherein the reverse primer primes amplification 

8 from the wildtype 2C19 sequence without priming amplification 

9 from the mutant 2C19 sequence (SEQ. ID. No. 54) shown in 
10 Fig. 17, 

1 26. The method of claim 12, wherein the forward 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 mutant 2C19 sequence (SEQ. ID. No. 54) shown in Fig. 17 

5 including the nucleotide at position 63 6 of the coding 

6 sequence, 

7 wherein the forward primer primes amplification 

8 from the complement of the mutant 2C19 sequence without 

9 priming amplification from the complement of the wildtype 2C19 
10 sequence (SEQ. ID. No. 52) shown in Fig 17. 

1 27. The method of claim 12, wherein the reverse 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 complement of the mutant 2C19 sequence (SEQ, ID. No. 54) shown 

5 in Fig. 17 including the complement to nucleotide 636 of , the 

6 coding region; 

7 wherein the reverse primer primes amplification 

8 from the mutant 2C19 sequence without priming amplification 

9 from the wildtype 2C19 sequence (SEQ- ID. No. 52) shown in 
10 Fig. 17. 

1 28. The method of claim 6, wherein the segment of 

2 the 2C19 sequence to be amplified is a cDNA secjuence, and the 

3 method further comprises the step of reverse transcribing mRNA 

4 in the sample to produce the cDNA sequence. 
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1 29. The method of claim 28, wherein the forward 

2 primer comprises about 10-50 contiguous nucleotides upstream 

3 of nucleotide 643 of the coding region of the wildtype 2C19 

4 cDNA sequence (SEQ. ID. No. 49) shown in Fig. 12 and 

5 hybridizes to the complement of the 2C19 sequence upstream 

6 from nucleotide 643 of the coding region, and the reverse 

7 primer comprises about 10-50 contiguous nucleotides from the 

8 complement of the wildtype 2C19 cDNA sequence (SEQ. ID No. 49) 

9 shown in Fig. 12 and hybridizes to the 2C19 sequence 
10 downstream from nucleotide 682 of the coding region. 

1 30. The method of claim 28, wherein the forward 

2 primer hybridizes to the complement of the wildtype 2C19 cDNA 

3 sequence (SEQ. ID. No. 49) shown in Fig. 12 between 

4 nucleotides 643 and 682 without hybridizing to the complement 

5 of the mutant 2C19 cDNA sequence (SEQ. ID. No. 50) shown in 

6 Fig. 12. 

1 31. The method of claim 30, wherein the reverse 

2 primer hybridizes to the wildtype 2C19 cDNA sequence (SEQ. ID. 

3 No. 49) shown in Fig. 12 between nucleotides 643 and 682 

4 without hybridizing to the mutant 2C19 cDNA sequence (SEQ. ID. 

5 No. |50) shown in Fig. 12. 

1 32. The method of claim 28, wherein the forward 

2 primer comprises about 10-50 contiguous nucleotides upstream 

3 of nucleotide 63 6 of the coding region of the wildtype 2C19 

4 CDHA sequence (SEQ. ID. No. 49) shown in Fig. 12, and the 

5 reverse primer comprises about io-50 contiguous nucleotides 

6 from the complement of the wildtype 2C19 cDNA sequence (SEQ. 

7 ID. No. 49) shown in Fig. 12 downstream from nucleotide 636 of 

8 the coding region. 

1 33. The method of claim 28, wherein the full-length 

2 2C19 CDNA sequence is amplified. 

1 34. The method of claim 33, further comprising the 

2 step of sequencing a segment of the 2C19 cDNA sequence. 
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1 35. The method of claim 5 further comprising the 

2 step of: 

3 digesting the DNA with a restriction enzyme 

4 that recognizes a site including nucleotide 63 6 or 681 of the 

5 2C19 DNA sequence; 

6 wherein: 

7 the 2C19 DNA sequence is genomic; and 

8 the analyzing step comprises detecting the 

9 products resulting from the digestion by Southern blotting 

1 with a labelled segment of the 2C19 DNA sequence as a probe. 

1 3 6. A diagnostic kit comprising: 

2 a forward primer sufficiently complementary 

3 with a first subsequence of the antisense strand of a double- 

4 stranded 2C19 genomic DNA sequence to hybridize therewith, and 

5 a reverse primer sufficiently complementary with a second 

6 subsequence of the sense strand of the 2C19 genomic sequence 

7 to hybridize therewith; 

8 wherein the first subsequence is upstream of 

9 nucleotide 681 of the coding region, and second subsequence is 
10 downstream of nucleotide 681 of the coding region. 

1 37. The diagnostic kit of claim 36, wherein the 

2 first subsequence is upstream from nucleotide 63 6 of the 

3 coding region. ^ 

1 38. The diagnostic kit of claim 36, wherein the 

2 forward primer has about 10-50 contiguous nucleotides from the 

3 wildtype 2C19 sequence (SEQ. ID. No. 51) shown in Fig. 16, and 

4 the reverse primer has about 10-50 contiguous nucleotides from 

5 the complement of the wildtype 2C19 sequence (SEQ. ID. No. 51) 

6 shown in Fig. 16. 
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1 39. The diagnostic kit of claim 38, further 

2 comprising 

3 a second forward primer sufficiently 

4 complementary with a first subsequence of the antisense strand 

5 of a double-stranded 2C19 genomic DNA sequence to hybridize 

6 therewith, and a a second reverse primer sufficiently 

7 complementary with a second subsequence of the sense strand of 

8 the 2C19 genomic sequence to hybridize therewith; 

9 wherein the first subsequence is upstream of 

10 nucleotide 63 6 of the coding region, and second subsequence Is 

11 downstream of nucleotide 63 6 of the coding region. 

1 40. The diagnostic kit of claim 39, further 

2 comprising a restriction enzyme that recognizes a site that 

3 includes nucleotide 681 or nucleotide 63 6 of the coding 

4 region. 

.1 41. A primer selected from the group consisting of: 

2 (a) a first forward primer having: 

3 about 10-50 contiguous nucleotides from 

4 the wildtype 2C19 sequence (SEQ. ID. No. 51) shown in Fig. 16 

5 including the nucleotide at position 681 of the coding region; 

6 whl^rein the first forward primer primes 

7 amplification from the complement of the wildtype 2C19 

8 sequence without priming amplification from the complement of 

9 the mutant 2C19 sequence (SEQ. ID. No. 61) shown in Fig. 16; 
^0 (b) a first reverse primer having: 

1^ about 10-50 contiguous nucleotides from 

12 the complement of the wildtype 2C19 sequence (SEQ. ID. No. 51) 

13 shown in Fig. 16 including the complement to nucleotide 681 of 

14 the coding region; 

> wherein the first reverse primer primes 

16 amplification from the wildtype 2C19 sequence without priming 

• 17 amplification from the mutant 2C19 sequence shown in Fig. 16; 

18 (c) a second forward primer having: 

19 about 10-50 contiguous nucleotides from 

20 the mutant 2C19 sequence (SEQ. ID. No. 61) shown in Fig. 16 

21 including the nucleotide at position 681 of the coding sequence. 
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22 wherein the second forward primer primes 

23 amplification from the complement of the mutant 2C19 sequence 

24 without priming amplification from the complement of the 

25 wildtype 2C19 sequence (SEQ. ID. No. 51) shown in Fig 16; and 

26 (d) a second reverse primer having: 

27 about 10-50 contiguous nucleotides from 

28 the complement of the mutant 2C19 sequence (SEQ- ID. No. 61) 

29 shown in Fig. 16 including the complement to nucleotide 681 of 

30 the coding region; 

wherein the second reverse primer primes 
32 amplification from the mutant 2C19 sequence without priming 
3 3 amplification from the wildtype 2C19 sequence (SEQ, ID. 
34 No. 51) shown in Fig, 16 

(e) a third forward primer having: 

about 10-50 contiguous nucleotides from 

37 the wildtype 2C19 sequence (SEQ. ID. No. 52) shown in Fig. 17 

38 including the nucleotide at position 636 of the coding region; 

wherein the first forward primer primes 

40 amplification from the complement of the wildtype 2C19 

41 sequence without priming amplification from the complement of 

42 the mutant 2C19 sequence (SEQ. ID. No. 54) shown in Fig. 17; 

(f) a third reverse primer having: 

about 10-50 contiguous nucleotides from 

45 the complement of the wildtype 2C19 sequence (SEQ. ID. No. 52) 

46 shown in Fig. 17 including the complement to^ nucleotide 636 of 

47 the coding region; 

wherein the first reverse primer primes 

49 amplification from the wildtype 2C19 sequence without priming 

50 amplification from the mutant 2C19 sequence (SEQ. ID. No. 54) 

51 shown in Fig. 17; 

52 (g) a fourth forward primer having: 

53 about 10-50 contiguous nucleotides from 

54 the mutant 2C19 sequence (SEQ. ID. No. 54) shown in Fig, 17 

55 including the nucleotide at position 636 of the coding 

56 sequence, 

wherein the second forward primer primes 
58 amplification from the complement of the mutant 2C19 sequence 
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59 without priming amplification from the complement of the 

60 wildtype 2C19 sequence (SEQ. ID. No. 52) shown in Fig 17;. and 

61 (h) a fourth reverse primer having: 

62 about 10-50 contiguous nucleotides from 

63 the complement of the mutant 2C19 sequence (SEQ. ID. No, 54) 

64 shown in Fig. 17 including the complement to nucleotide 681 of 

65 the coding region; 

66 wherein the fourth reverse primer primes 

67 amplification from the mutant 2C19 sequence without priming 

68 amplification from the wildtype 2C19 sequence (SEQ. ID. 

69 No. 52) shown in Fig. 17. 
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2c A.^JlGAgATGG AAGGAGATCC GGCGTTTCTc CCTCAtgAcg cTGCGGAATT 

2c8 AAAGAGATGG AAGGAGATCC GGCGTTTCTC CCTCACAAAC TTGCGGAATT 

25 AAAGAAATGG AAGGAGATCC GGCGTTTCTC CCTCATGACG CTGCGGAATT 

65 AAAGAAATGG AAGGAGATCC GGCGTTTCTC CCTCATGACG CTGCGGAATT 

2 9c AAAGAGATGG AAGGAGATCC GGCGTTTCTG CCTCATGACT CTGCGGAATT 

6b AAAGAGATGG AAGGAGATCC GGCGTTTCTG CCTCATGACT CTGCGGAATT 

11a AAAGAGATGG AAGGAGATCC GGCGTTTCTC CCTCATGACG CTGCGGAATT 

450 

2c TTGGGATGGG GAAGAGGAGC ATtGAGGACC GTGTTCAAGA GGAAGCcCgC 
2c8 TTGGGATGGG GAAGAGGAGC ATTGAGGACC GTGTTCAAGA GGAAGCTCAC 

25 TTGGGATGGG GAAGAGGAGC ATTGAGGACC GRGRRCAAGA GGAAGCCCGC 

65 TTGGGATGGG GAAGAGGAGC ATTGAGGACC GTGTTCAAGA GGAAGCCCGC 
2 9c TTGGGATGGG GAAGAGGAGC ATCGAGGACC GTGTTCAAGA GGAAGCCCGC 

6b TTGGGATGGG GAAGAGGAGC ATCGAGGACC GTGTTCAAGA GGAAGCCCGC 
11a TTGGGATGGG GAAGAGGAGC ATTGAGGACC GTGTTCAAGA GGAAGCCCGC 

500 

2c TGCCTTGTGG AGGAGTTGAG AAAAACCAAg GCcTCACCCT GTGATCCCAC 

2c8 TGCCTTGTGG AGGAGTTGAG AAAAACCAAG GCTTCACCCT GTGATCCCAC 

25 TGCCTTGTGG AGGAGTTGAG AAAAACCAAG GCCTCACCCT GTGATCCCAC 

65 TGCCTTGTGG AGGAGTTGAG AAAAACCAAG GCCTCACCCT GTGATCCCAC 

2 9c TGCCTTGTGG AGGAGTTGAG AAAAACCAAT GCCTCACCCT GTGATCCCAC 

6b TGCCTTGTGG AGGAGTTGAG AAAAACCAAT GCCTCACCCT GTGATCCCAC 

11a TGCCTTGTGG AGGAGTTGAG AAAAACCAAG GCTTCACCCT GTGATCCCAC 

550 

2c TTTCATCCTG GGCTGTGCTC CCTGCAATGT GATCTGCTCc .TTaTTTTCC 
2c8 TTTCATCCTG GGCTGTGCTC CCTGCAATGT GATCTGCTCC GTTGTTTTCC 

25 TTTCATCCTG GGCTGTGCTC CCTGCAATGT GATCTGCTCC ATTATTTTCC 

65 TTTCATCCTG GGCTGTGCTC CCTGCAATGT GATCTGCTCC ATTATTTTCC 
2 9c TTTCATCCTG GGCTGTGCTC CCTGCAATGT GATCTGCTCT GTTATTTTCC 

6b TTTCATCCTG GGCTGTGCTC CCTGCAATGT GATCTGCTCT GTTATTTTCC 
11a TTTCATCCTG GGCTGTGCTC CCTGCAATGT GATCTGCTCC ATTATTTTCC 

551 600 

2c AtaAaCG.TT tGATTATAAA GATCAG . aaT TTCTtAaCtT gATGaAAAaa 

2c8 AGAAACGATT TGATTATAAA GATCAGAATT TTCTCACCCT GATGAAAAGA 

25 ATAAACGTTT TGATTATAAA GATCAGCAAT TTCTTAACTT AATGGAAAAG 

65 ATAAACGTTT TGATTATAAA GATCAGCAAT TTCTTAACTT AATGGAAAAG 

2 9c ATGATCGATT TGATTATAAA GATCAGAGGT TTCTTAACTT GATGGAAAAA 

6b ATGATCGATT TGATTATAAA GATCAGAGGT TTCTTAACTT GATGGAAAAA 

11a AGA.^.ACGTTT CGATTATAAA GATCAGC^^J^.T TTCTTAACTT GATGG-AAA^ 

FIG. 2-3. 
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601 

2c TT.AATGAAA 



:CA.\TGAAA 



2 5 TlGkA'IGA?A 
65 TTG.^TGAAA 

2 9c TTC.\ATGAAA 
6b TTC,\ATGAAA 

11a TTGAATGAAA 



2c 
2c8 

25 

65 
2Sc 

6b 



651 

TAATTT . cCt 
TAATTTCCCT 
TAATTTTTCT 

'p T* 'p ^ 

TAATTTCCCT 



ACaTCAgGAT 
ACTTCAGGAT 
ACATCAAGAT 
ACATCAAGAT 
ACCTCAGGAT 
ACCTCAGGAT 
ACATCAGGAT 



CCt .TCATtG 
CTACTCATTG 
CCTATCATTG 
CCTATCATTG 
GCTCTCATCG 
GCTCTCATCG 



la TAATTTTCCC ACTATCATTG 



701 

2c tTaAAAA.gT 
2c8 TTAAAAATGT 

25 TTAAAAACGT 

65 TTAAAAACGT 
2 9c CTGAAAATTT 

6b CTGAAAATTT 
11a TTAAAAACGT 

751 

2c CAcCAAGaAT 
2c8 CACCAAGCAT 

25 CACCAAGAAT 

65 CACCAAGAAT 
2 9c CATCAAGAAT 

6b CATCAAGAAT 
11a CACCAAGAAT 

301 

2c CCTGATcAAJi 
2c8 CCTGATCAAA 

25 CCTGATGAAA 

65 CCTGATGAAA 
2 9c CCTGATCAAA 

5b CCTGATCAAA 



TGCTtttAtg 
TGCTCTTACA 
TGCTTTTATG 
TGCTTTTATG 
TGCTTACATT 
TGCTTACATT 
TGCTTTTATG 



Ca .TGGAcaT 
CACTGGATGT 
CAATGGACAT 
CAATGGACAT 
CCCTGGACAT 
CCCTGGACAT 
CGATGGACAT 



5/30 

TcTgAgC . cc 
TCTGAACTCC 
TTTGAGCAGC 
TTTGAGCAGC 
TCTGAGCTCT 
TCTGAGCTCT 
TGTAAGCACC 



ATTattTCCC 
ATTGTTTCCC 
ATTACTTCCC 
ATTACTTCCC 
ATTATCTCCC 
ATTATCTCCC 
ATTATTTCCC 



aaAAGTtAta 
CGAAGTTACA 
AAAAGTTATA 
AAAAGTTATA 

aaaagttatg 
aaaagttatg 
gaaaItgata 



CCATGGATCC 
CCCTGGATCC 
CCCTGGATCC 
CCATGGATCC 
CCATGGATCC 
CCCTGGATCC 



.GGAActCA. 
AGGAACTCAC 
GGGAACTCAC 
GGGAACTCAC 
AGGAAGTCAT 
AGGAAGTCAT 
GGGAACCCAT 



TtttGGAgAa 
TTAGGGAGAA 
TTTTGGAAAA 
TTTTGGAAAA 
TATTGGAGAG 
TATTGGAGAG 
TTTTGGAGAA 



gAACAa. cCT 
TAACAATCCT 
GAACAACCCT 
GAACAACCCT 
GAACAGTGCT 
GAACAGTGCT 
CAACT^CCCT 



CgGGACTTTA 
CGGGACTTTA 
CAGGACTTTA 
CAGGACTTTA 
CGGGACTTTA 
CGGGACTTTA 
CGGGACTTTA 



A.TGGAg, AGG 
ATGGAGCAGG 
ATGGAGAAGG 
ATGGAGAAGG 
ATGGAACAGG 
ATGGAACAGG 



AAAAGcAcAA cCAAcagTCt 
AAAAGGACAJV CCAAAAGTCA 
AAJU^GCACAA CCAACCATCT 
AAAAGCACAJi CCAACCATCT 
AAAAGCACAJV TCAACAGTCT 
AAAAGCACAA TCA-ACAGTCT 



.^a 



GATCA.AA A^ 



iGG AA-AAGCAAJLA 

FIG. 2^4. 



650 

AG.TcTGCAA 
AGGTCTGCAA 
AGATCTGCAA 
AGATCTGCAA 
AGGTCTGCAA 
AGGTCTGCAA 
AGATATGCAA 

700 

AAcAAAtTac 
AACAAAGTGC 
AACAAATTAC 
AJ\CAAATTAC 
A-ATAAAATAG 
AJ\TAAAATAG 
AJ^CAAATTAC 

750 

AgTAAAAGAA 
AGTAAAAGAA 
AGTAAAAGAA 
AGTAAAAGAA 
AATAAAAGAA 
AATAAAAGAA 
AGTAAAAGAA 

800 

TtGATTGcTT 
TGGATTGCTT 
TTGATTGCTT 
TTGATTGCTT 
TTGATTGTTT 
TTGATTGTTT 
TTGATTGCTT 

850 

GAATTtAcTa 
GAJ^TTCAATA 
G AAT T T AC T A. 
GAATTTACTA 
3AATTTACTG 
GAATTTACTG 
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951 

2c TTGAAAgCTT 
2c8 TTGAAAACTT 

2 5 TTGA.ZVAGCTT 

55 TTGAAAGCTT 
2 9c TTGA.\AGCTT 

6b TTGAAAGCTT 
ila TTGAAAACTT 



Ggta . .CACT 
GGTTGGCACT 
GGAAAACACT 
GGAAAACACT 
GATAGCCACT 
GATAGCCACT 
GGTAATCACT 



6/30 

G.AgcTGA.C 
GTAGCTGATC 
GCAGTTGACT 
GCAGTTGACT 
GTAACTGATA 
GTAACTGATA 
GCAGCTGACT 



TgtTTGgaGC 
TATTTGTTGC 
TGTTTGGAGC 
TGTTTGGAGC 
TGTTTGGGGC 
TGTTTGGGGC 
TACTTGGAGC 



900 

TGG .ACAGAG 
TGGAACAGAG 
TGGGACAGAG 
TGGGACAGAG 
TGGAACAGAG 
TGGAACAGAG 
TGGGACAGAG 



901 

2c ACaACaAGCA 
2c8 ACAACAAGCA 

2 5 ACGACAAGCA 

55 ACGACAAGCA 
2 9c ACAACGAGCA 

6b ACAACGAGCA 
11a ACAACAAGCA 



C.AC .CTGAG 
CCACTCTGAG 
CAACCCTGAG 
C-AACCCTGAG 
CCACTCTGAG 
CCACTCTGAG 
CAACCCTGAG 



ATATG . . CTC 
ATATGGACTC 
ATATGCTCTC 
ATATGCTCTC 
ATATGGACTC 
ATATGGACTC 
ATATGCTCTC 



CT .CTCCTGC 
CTGCTCCTGC 
CTTCTCCTGC 
CTTCTCCTGC 

v_ i J. 1 Ijv^ 

CTGCTCCTGC 
CTTCTCCTGC 



950 

TGAAGcACCC 
TGAAGCACCC 
TGAAGCACCC 
TGAAGCACCC 
TGAAGTACCC 
TGAAGTACCC 
TGAAGCACCC 



951 

2c AGAGGTCACA 
2c8 AGAGGTCACA 

25 AGAGGTCACA 

65 AGAGGTCACA 
2 9c AGAGGTCACA 

6b AGAGGTCACA 
11a AGAGGTCACA 



GCTAAAGTCC 
GCTAAAGTCC 
GCTAAAGTCC 
GCTAAAGTCC 
GCTAAAGTCC 
GCTAAAGTCC 
GCTAAAGTCC 



AGGAAGAGAT 
AGGAAGAGAT 
AGGAAGAGAT 
AGGAAGAGAT 
AGGAAGAGAT 
AGGAAGAGAT 
AGGAAGAGAT 



TGAacgTGTa 
TGATCATGTA 
TGAACGTGTG 
TGAACGTGTG 
TGAATGTGTA 
TGAATGTGTA 
TGAACGTGTG 



1000 
aTTGGCAGAa 
ATTGGCAGAC 
ATTGGCAGAA 
ATTGGCAGAA 
GTTGGCAGAA 
GTTGGCAGAA 
ATTGGCAGAA 



1001 

2c ACcGGAGCCC 
2c8 ACAGGAGCCC 

25 ACCGGAGCCC 

65 ACCGGAGCCC 
2 9c ACCGGAGCCC 

6b ACCGGAGCCC 
ila ACCGGAGCCC 



CTGcATGCAg 
CTGCATGCAG 
CTGCATGCAA 
CTGCATGCAA 
CTGTATGCAG 
CTGTATGCAG 
CTGCATGCAG 



GAcAGGaGcC 
GATAGGAGCC 
GACAGGAGCC 
GACAGGAGCC 
GACAGGAGTC 
GACAGGAGTC 
GACAGGGGCC 



ACATGCCcTA 
ACATGCCTTA 
ACATGCCCTA 
ACATGCCCTA 
ACATGCCCTA 
ACATGCCCTA 
ACATGCCCTA 



1050 
CACaGATGCT 
CACTGATGCT 
CACAGATGCT 
CACAGATGCT 
CACAGATGCT 
CACAGATGCT 
CACAGATGCT 



1051 
2c GTgGTGCACG 
2c8 GTAGTGCACG 

55 GTGGTGCACG 
2 9c GTGGTGCACG 



AG . TCCAGAG 
AGATCCAGAG 
AGGTCCAGAG 
AGGTCCAGAG 
AGATCCAGAG 
AGATCCAGAG 



ATACattGAC CT . cTCCCCA 
ATACAGTGAC CTTGTCCCCA 
ATACCTTGAC CTTCTCCCCA 
ATACATTGAC CTTCTCCCCA 
ATACATTGAC CTCCTCCCCA 
ATACATTGAC CTCCTCCCCA 
AT AC AT C G AC C T CAT C C C C A 
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1101 




7/30 




1 1 so 

* X J u 


2c 


CCATGC^GTG 


ACCt aTGA . 


t" '"AA^^TTPArr 
u ^ rt-f\ci X X 


A A APT ArP'^^ 


AT '^^'^AAPP 


2c8 


C C AT G G AG X G 


ACCAC^GATA 




AAAPT ACC^r 


^TPT'^^ AAPP 


25 


CCATGC^GTG 




TTA A ATTP Ar: 


A A APTATTTP 


A TTP'"^ A APP 


55 


i \ * ^ t m t \ - 

V^f -il X W *i VJ X 




TT'AAAT'T^^Ai^ 


A APT ATPT/^ 


ATTP'^'^AAPP 
AX X\^w\-.AAIjU 








X X AAA 1 X AA 


AAA\^ 1 A(^U X 


ATPP'^'^^V TiPP 

A 1 L L ^ AAb b 


5b 


CCATGCAGTG 


ACCTGTGATG 


TTAAATTCAA 


AAACTACCTC 


atccccaagg 


11a 


CCATGGAGTG 


ACCTGTGACG 


TTAAATTCAG 


.=lAACTACCTC 


attccc.\agg 




1151 








X <L vj U 


2 c 


GCAcaACCAT 


- Taac 




TppTp PAf rr A 


AP A 1 a PA A 
. . riL-A*-\AUAA 


2c8 


GCACAACCAT 


^ ATGGT ATTA 


p T A p T **" r* p r: 

U 1 oAC X J. L^L^o 


TPPTAPATPA 
1 1 AL»A1 UA 


TPAPAA APA A 
X vjAUAAAoAA 


25 


GCAC^ \cc;=-'T^ 


^ TT A •aT'^'^CC 


PTP APTT/^TP 


TPPTAPATPA 


AA b A.-1 A b A A 


5 5 


GCACA^^rC -T 


2i TTA ATT'^rr 

... X X r\r\ J. X J. ^ v_ 


V- X oAC X X ^ 1 


TPPTAPATPA 
X LjU 1 AL-Al LjA 


AA U A b A A 


2 9c 


GCACGACCr^T 


^ ATA-CATCr 


^Tp AP'^T'^'^P 


TppTPP AP A A 


^PAPAAAPAA 
_ J AL.*!-— "IALjAA 


ob 


GCAIGACCAT 


AATAACATCC 


CTGACTTCTG 


TGCTGCACAA 


TGACAA.\GAA 




GCACrJ^CCAT 




CTCACTTCTG 


TGCTACATGA 


CAAC^AAGAA 




1201 








1 9 c; n 


2c 


TTt CCcAArC 


fArrArrATrrTT 


TPAPPPT ^ 


PAPTTTPTrrP 
LAL 1 i 1 L i y b 


Ai . A . . gi bb 


2c8 




p A A AT ATPTT 


TP A PnPTPr*P 


^AL^ 1 1 1 U I Ao 


ATAAPA ATPP 
AI AAbAAi bb 


25 


TTTCCCAACr 


APAPATPTT 


TPAPr'^TPAT 


^IPTTTPTPP 


A TPA APPTPr^ 
A X bAAbb i. bb 


55 


TTTCCC AACC 


CAGAGATGTT 


TPAPP'^TPAT 


P A PTTTP'^PP 


A TP A 'h.rT^nn 

AX bAAbb 1 bb 


29c 


TTCCCCAACC 


CAGAGATGTT 


Tf^APPPTPPP 


PAPTTTfTPn 


AT A AP APTPP 
AX AAbAb X bb 


6b 


TTCCCCAACC 


CAGAGATGTT 


TGACCCTGGC 


CACTTTCTGG 


ataagagtgg 


11a 


TTTCCCAACC 


CAGAGATGTT 


TGACCCTCGT 


CACTTTCTGG 


aItgaaggtgg 




1251 








1300 


2c 


CAA.TTTAAG 


AAAAGT . AcT 


ACT'^CATGCC 


-"TTCTCAGC? 




2c8 


CAACTTTAAG 


AAAAGTGACT 


ACTTCATGCC 


"^TTCTCAGCA 


GGAAA-CGAA 


25 


CAATTTTAAG 


AAAAGTAAAT 


ACTTCATGCC 


'^TTCTCAGCA 


GGAAAA''^r;r; A 


65 


CAATTTTAAG 


AAAAGTAAAT 


ACTTCATGCr 


'^TTCTCAGrA 


PPA A A Appp A 


2 9c 


CAACTTTAAG 


AAAAGTGACT 


ACTTCATGCC 

<>rx w X X ,r\ X \j Vur ^ 


'"TTCTCAGrA 


r^r^A A A APPP A 


5b 


CAAC'^'^TAAG 


AAAAG^GArT 


A PTTPATPPP 


TTTPTP APP A 


PPAAAAPPPA 
ovjAAAAbboA 


lla 


AAATTTTAAG 


AAAAGTAACT 


ACTTCATGCC 


TTTCTCAGCA 


GGAA.aA.CGGA 




1301 








1350 


J. c 


TtTGTg-gGG 


AGA>GccCTg 


GCCcGCATGG 


~* G CT g T T t 


ATTcCTgACC 


2c8 


TTTGTGCAGG 


AGAAGGACTT 


GCCCGCATGG 


agctattttt 


ATTTCTAACC 


25 


i X i U 1 X ooo 


.-\GA/^ov^ C'va^TG 


r* r* ^ ^ rr>/^ c 
0\^\-^VjOCn X oij 


agctgttttt 


ATTCCTGACC 


55 


rF* fTi ryi 

I X x\j l\j X 


AGAAGCCCTG 


GCCCGCATGG 


agctgttttt 


ATTCCTGACC 


2 9c 




GAGvjvjCCTG 


GCCCGCATGG 


agctgttttt 


ATTCCTGACC 


5b 


TGTGTATGGG 


A G A G \j C C T G 


1^ r> ^ rr^r^ ^ 

*^ V- U X \J 1 J 


"* '~' »*T' m 

.-.LjU I o X X 1 * X 


ATTCCTGACC 




- ^ G _ c - G 


..w.-V ^ ^ - - 




■~* ^r* 
---'JV- X O X - 1 . X 


ATTCCTGACC 



FIG. 2-6. 
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1351 8/30 1400 

2c .ccATTTTaC AGAACTTTAA CCTGAAATCT ctggtTGAcc cAAAG.AccT 

2c8 ACAATTTTAC AGAACTTTAA CCTGAAATCT GTTGATGATT TAAAGAACCT 
2 5 TCCATTTTAC AGAACTTTAA CCTGAAATCT CTGGTTGACC CAAAGAACCT 
55 TCCATTTTAC AGAACTTTAA CCTGAAATCT CTGGTTGACC CAAAGAACCT 

2 9c ACCATTTTGC AGAACTTTAA CCTGAAATCT CAGGTTGACC CAAAGGATAT 
6b ACCATTTTGC AGAACTTTAA CCTGAAATCT CAGGTTGACC CAAAGGATAT 

11a TTCATTTTAC AGAACTTTAA CCTGAAATCT CTGATTGACC CAAAGGACCT 

1401 1450 

2c tgAcAccACt cCagTTg.CA AtGgatTTGc ttcTgTgCC. CCCTtcTAcC 

2c8 CAATACTACT GCAGTTACCA AAGGGATTGT TTCTCTGCCA CCCTCATACC 

25 TGACACCACT CCAGTTGTCA ATGGITTTGC CTCTGTGCCG CCCTTCTACC 

65 TGACACCACT CCAGTTGTCA ATGGATTTGC CTCTGTGCCG CCCTTCTACC 

2 9c TGACATCACC CCCATTGCCA ATGCATTTGG TDGTGTGCCA CCCTTCTACC 

ob TGACATCACC CCCATTGCCA ATGCATTTGG TCGTGTGCCA CCCTTCTACC 

11a TGACACAACT CCTGTTGTCA ATGGATTTGC TTCTGTCCCG CCCTTCTATC 

1451 1500 

2c AGcT.TGCTT CATtCCTGTC TGAAGAAggg cAGatggtcT GGCTGCT.cT 

2c8 AGATCTGCTT CATCCCTGTC TGAAGAATGC TAGCCCATCT GGCTGCTGAT 

2 5 AGCTGTGCTT CATTCCTGTC TGAAGAAGAG CAGATGGCCT GGCTGCTGCT 

65 AGCTGTGCTT CATTCCTGTC TGAAGAAGAG CAGATGGCCT GGCTGCTGCT 

2 9c AGCTCTGCTT CATTCCTGTC TGAAGAAGGG CAGATAGTTT GGCTGCTGCT 

6b AGCTCTGCTT CATTCCTGTC TGAAGAAGGG CAGATAGTTT GGCTGCTGCT 

11a AGCTGTGjpTT CATTCCTGTC TGAAGAAGCA CAGATGGTCT GGCTGCTGCT 

1501 1550 

2c cTGCtgTC.C t. . . . ttt..tCT:gg ggcaarttcC .cert. car. 

2c8 CTGCTATCAC CTGCAACTCT TTTTTTATCA AGGACATTCC CACTATTATG 

2 5 GTGCAGTCCC TGCAGCTCTC TTTCCTCTGG GGCATTATCC ATCTTTCACT 

65 GTGCAGTCCC TGCAGCTCTC TTTCCTCTGG GGCATTATCC ATCTTTCACT 

2 9c GTGCTGTCAC CTGCAATTCT CCCTTATCAG GGCCATTAGC CTCTCCCTTC 

6b GTGCTGTCAC CTGCAATTCT CCCTTATCAG GGCCATTGGC CTCTCCCTTC 

11a GTGCTGTCCC TGCAGCTCTC TTTCCTCTGG TCCAAATTTC ACTATCTGTG 

1551 1600 
2c ..t-rr-.tg c.rtt.Tca tcTg. caret caca.t.c. ctrcccrta. 
2c8 TCTTCTCTGA CCTCTCATCA AATCTTCCCA TTCACTCAAT ATCCCATAAG 
2 5 ATCTGTAATG CCTTTTCTCA CCTGTCATCT CACATTTTCC CTTCCCTGAA 
65 ATCTGTAATG CCTTTTCTCA CCTGTCATCT CACATTTTCC . CTTCCCTGAA 
2 9a TCTCTGTGAG GGATATTTTC TCTGACTTGT CAATCCACAT CTTCCCATTC 
5b TCTCTATGAG GGATATTTTC TCTGACTTGT CAA.TCCACAT CTTCCCATTC 
TGACCCGTCA TCTCACAT'^"' ^CCCTTCCCC CA^^G^'^C'^AG 

FIG. 2-7. 
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1601 9/30 

2c catc.Ag..d ccaTt . a . . . .caar.rcca 

IcQ CATCCAAa.CT CCATTAAGGA GAGTTGTTCA 

25 GATCTAGTGA ACATTCGACC TTCATTACGG 

55 GATCTAGTGA ACATTCGACC TCCATTACTT 

■9c CCTCAAGATC CAATGAACAT CCAACCTCCA 

6b CCTCAAGATC CAATGAACAT CCAACCTCCA 

la TGAACATTCA GCCTCCATTA AAAAAGTTTC 



agag . gcg . . 
GGTCACTGCA 
AGAGTTTCCT 
AGAGTTTCCT 
TTAAAGAGAG 
TTAAAGAGAG 
ACTGTGCAAA 



1650 
utt .Tt . .c- 
CAAATATATC 
ATGTTTCACT 
ATGTTTCACT 
TTTCTTGGGT 
TTTCTTGGGT 
TATATCTGCT 



^^51 1700 
2c .tccaccua. atctacc.t . . . . cc . ct . t.t.t..aT. actttgattg 

2c8 TGCA^^^TTATT CATACTCTGT AACACTTGTA TTAATTGCTG CATATGCTAA 
2 5 GTGCAA.^TAT ATCTGCTATT CTCCATACTC TGTAACAGTT GCATTGACTG 
55 GTGC?ui.ATAT ATCTGCTATT CTCCATACTC TGTAACAGTT GCATTGACTG 

-9c CACTTCCTAA ATATATCTGC TATTCTCCAT ACTCTGTATC ACTTGTATTG 
Sd CACTTCCTAA ATATATCTGC TATTCTCCAT ACTCTGTATC ACTTGTATTG 

^la ATTCCCCATA CTCTATA.2VTA GTTACATTGA GTGCCACATA ATGCTGATAC 



2c 
2c8 

25 

65 
29c 

6b 
11a 



2c 
2c8 

25 

65 
2 9c 

6b 



1701 

tec . era . tg 
TACTTTTCTA 
TCACATAATG 
TCACATAATG 
ACCACCACAT 
ACCACCACAT 
TTGTCTAATG 

1751 
s - = 



aTg.taatt . 
ATGCTGACTT 
CTCATACTTA 
CTCATACTTA 
ATGCTAATAC 
ATGCTAATAC 
TTGAGTTATT 



tttaatattg 
TTTAATATGT 
TCTAATGTTG 
TCTAATGTTG 
CTATCTACTG 
CTATCTACTG 
AACATATTAT 



. . ttattg. . 
TATCACTGTA 
AGTTATTAAT 
AGTTATTAAT 
CTGAGTTGTC 
CTGAGTTGTC 
TATTAAATAG 



1750 
A. . .t.ttAt 
AAACACAGAA 
ATGTTATTAT 
ATGTTATTAT 
AGTATGTTAT 
AGTATGTTAT 
A 



AAGTGATTAA 
TAAATAGAGA 
TAAATAGAGA 
CACTAGAAAA. 
CACTATA^J!iA 



. aaAtgAtaa 
TGAATGATA-A 
AATATGATTT 
AATATGATTT 
CAAAGAAAAJ^ 
CAAAGAAAAA 



180C 

zz.z.z..aa aT-.-aagrc A.tgc.tt. 

TTTAGTCCAT TTCTTTTGTG AATGTGCTA3, 

GTGTATTATA ATTCAAAGGC ATTTCTTTTC 

GTGTATTATA ATTCAAAGGC ATTTCTTTTC 

TGATTAATAA ATGArAATTP AGAGCCAAAA 

TGATTAATAA ATGACAATTr AGAGCCATTT 



1850 



1801 

2c a..at.-.c. . aaTaaAaag cartaTtATT tgctcaaAaa aaGTCAGTTC 

2c8 ATAAA AAGTr- TTATT?J^TTG CTGGTTCA 

25 TGCATGTTCT AMIMAAAG CATTATTATT TGCTGAAAA-A .-„A 

55 TGCATGTTCT AMI£iArJ\G CATTATTATT TGCTGAAAA.^ AA ■ 

2 9c AAAAAAAAAA 

5b ATTCTCTGCA TGCTCTAGAT AAJLAATGATT ATTATTTACT GGGTCAGTTC 



FIG. 2-8. 
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iscn 10/30 

1851 1900 
5b TTAGATTTCT TTCTTTTGAG TAAAATGAAA GTAAGAAATG AAAGAAAATA 

1501 1950 
bb GAATGTGAAG AGGCTGTGCT GGCCCTCATA GTGTTAAGCA CAAAAAGGGA 

15^1 2000 

6b GAAAGGTAAG AGGGTAGGAA AGCTGTTTTA GCTAAATGCC ACCTAGAGTT 

2001 2050 

6b ATTGGAGGTC TGAATTTGGA AAAAAAAACT ATGTCCAGGA GAACATTAAG 

2101 2150 

6b TGTTTGAATT CATGCTCTGC TTTTGTGTTA CTGTAAACAC AAGATCAAGA 

2151 2200 

5b TTTGGATAAT CTTTTTCCTT 7GTGTTTCCA ACTTAGATCA TGTCT AAATA 

2201 2216 
6b lATGCTTTCA TATGGC 

FIG. 2-9. 
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ATGGTGATG7 AGnAAnTCAT nCCATCTTAT ATTTCnAGAG TGTAGAGGAG 
GATTGTTGnG GAAGTAAGAG GnnTAAGATA GAGATGCnTT TATACTATCC 



CAAGCAGGGA TrAGTCTAGG AAATGATTaT CGTCttTGAT TCTCTTGTCA 

GrAttTTCTT TCTCmnATCT TGtATAATCA GAGaatTACT ACACATGgAC 

AATrAarATT TCCCCnTCcA GAtAnACaAt ATATTTTATT TATATTTATA 

GTTTTAAATT ACAACCAGAG CTTGGCATAT TGTATCTATA CCTTTAATAA 

INTRON 4 I EXON 5 
ATGCTTTTAA TTTAATAAAT TATTGTTTTC TCTTAdATAT GCAATAATTT 

A ^ 
TCCCACTATC ATTGATTATT TCCCGGGAAC CCATAACAAA TTACTTAAAA 

'681 

ACCTTGCTTT TATGGA/^P-.GT GATATTTTGG AGAAAGTAAA AGAACACCAA 
GAATCGATGG ACATCAACAA CCCTCGGGAC TTTATTGATT GCTTCCTGAT 



CAAAATGGAG AAGjGTAAAAT GTTAACAAAA GCTTAGTTAT GTGACTGCTT 
GCGTATkTGT GATTCATTGA CTAGTTGkGT GTTTACTACG GATGTTTAAC 
AGGTCAAGGA GTAATGCTTG AGAAGCATAT TTAAGTTTTt ATTGTaTGCA 
TGAATATCCA GTAAGCATCA TAGAAJVATGT AAAATTAAnT TGtTAaATAa 
TTAGAaTACA TAGAAGAAAT tGTTtAGATA AATATnATCT ATCTGAACAA 
TAAGGATGTC AGGATAGGAA AAGCTCTGTT TCTGCAGCTT CCAGTGGAGA 
TCAGCACAGG AGGGAACTTA TTTTTT 




FIG. 16. 
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agggaaaagacaaataggccggggatgnaaatttagcatgtgagcaacc wt 

ttanttaaccagctaggctgtaattgntaattcgagantaatgtnaaagt wt 

gatgtgttgattttatgcatgccnnactcntttttgcttttaaggggagt wt 

cataggtaagatattacttaaaatttctaaactat tattatrtcTfrAprr wt 

a^tatqa agtgttttatatctaatgtttactcatattttaaaattgtttc wt 
I ' I I I I I I I I I M I I I I I I I M I M M I I I I I I M M I I I I I I 
atgaagtgttttatatctaatgtttactcatattttaaaattgtttc mutant 

SerProCysAspProThrPhelleLeuGlyCysAlaP 

caatcatttagCTTCACCCTGTGATCCCACTTTCATCCTGGGCTGTGCTC wr. 

M M I I I I I I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I 

caatcatttagCTTCACCCTGTGATCCCACTTTCATCCTGGGCTGTGCTC mutant 
-^482 

roCysAsnVallleCysSerllellePheGlnLysArgPheAspTyrLys 

CCTGCAATGTGATCTGCTCCATTATTTTCCAGAAACGTTTCGATTATAAA wt 

""> I > I I I I I I I I I I I I I M I I I I I I I M I I I M I I I I I I I I M I I I I 

CCTGCAATGTGATCTGCTCCATTATTTTCCAGAAACGTTTCGATTATAAA mutant 
[His] 

AspFlnGlnPheLewAsnLewMetGluLysLeuAsnGluAsnlleArgll 
GATCAGCAATTTCTTAACTTGATGGAAAAATTGAATGAAAACATCAkAT wt 
' I I I ' I ' I I I I I I I I I I I I M I I I I I I I I I I I M I I I M I I I I I I 

GATCAGCAATTTCTTAACT7GATGGAAAAATTGAATGAAAACATCAGGAT mutant 
eValSerThrProTrpIleGln 

TGTAAGCACCCCCTGGATCCAGgtaaggacaagttttgtgcttcctgaga wt 

"> I I I I I I I I I M I III M I I II I I M M i I I I I II II I II 

TGTAAGCACCCCCTGAATCCAGgtaaggacaagttttgtgcttcctgaga mutant 
End "642 

aaccacttacagtctttttttctgggaaatccaaaattcta tattaarr;^ wt 
I I I I II M I I M N I I I I I I I I I I I M II I II I I I I I I I I M II I 
aaccacttacagtctttttttctgggaaatccaaaattctatatt mutant 

aqccctqarigr acatttgtgaatactacagtcttgcctagacagccatggggt wt 

gaatatctggaaaagatggcaaagntctttattttatgcacaggaaatgaata wt 

tcccaatatagatcaggcttctaagcccattagctccctgatcagtgttt wt 

FIG. 17. 
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